1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-12-22 03:38:13 +00:00

Fix SPARQL regex exercise

This commit is contained in:
J. Fernando Sánchez 2018-03-20 16:46:52 +01:00
parent e5fa77a128
commit bb2e3c2fe4

View File

@ -1381,6 +1381,56 @@
"### Regular expressions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last SPARQL concept we will cover are [regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) (`regex`).\n",
"Regular expressions are a very powerful tool, but we will only cover the basics in this exercise.\n",
"\n",
"In essence, regular expressions match strings against patterns.\n",
"In their simplest form, they can be used to find substrings within a variable.\n",
"For instance, using `regex(?label, \"substring\")` would only match if and only if the `?label` variable contains `substring`.\n",
"But regular expressions can be more complex than that.\n",
"For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.\n",
"\n",
"The syntax of the regex function is the following:\n",
"\n",
"```\n",
"regex(?variable, \"pattern\", \"flags\")\n",
"```\n",
"\n",
"Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).\n",
"\n",
"As an example, let us find the cities in Madrid that contain \"de\" in their name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql\n",
"\n",
"SELECT ?localidad\n",
"WHERE {\n",
" ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid> .\n",
" ?localidad rdfs:label ?nombre .\n",
" FILTER (lang(?nombre) = \"es\" ).\n",
" FILTER regex(?nombre, \"de\", \"i\")\n",
"}\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, use regular expressions to find Spanish novelists whose **first name** is Juan.\n",
"In other words, their name **starts with** \"Juan\"."
]
},
{
"cell_type": "code",
"execution_count": null,
@ -1421,7 +1471,7 @@
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "71b5b187bb147c0e7444b29a4f413720",
"checksum": "6632242d1d5055e12c3df37941b9e434",
"grade": true,
"grade_id": "cell-c149fe65008f39a9",
"locked": true,
@ -1434,7 +1484,8 @@
"source": [
"assert len(LAST_QUERY['columns']['nombre']) > 15\n",
"for i in LAST_QUERY['columns']['nombre']:\n",
" assert 'Juan' in i"
" assert 'Juan' in i\n",
"assert \"Robert Juan-Cantavella\" not in LAST_QUERY['columns']['nombre']"
]
},
{
@ -1507,6 +1558,10 @@
"Querying the manually annotated dataset is slightly different from querying DBpedia.\n",
"The main difference is that this dataset uses different graphs to separate the annotations from different students.\n",
"\n",
"**Each graph is a separate set of triples**.\n",
"For this exercise, you could think of graphs as individual endpoints.\n",
"\n",
"\n",
"First, let us get a list of graphs available:"
]
},