mirror of
https://github.com/gsi-upm/sitc
synced 2024-12-22 03:38:13 +00:00
Fix SPARQL regex exercise
This commit is contained in:
parent
e5fa77a128
commit
bb2e3c2fe4
@ -1381,6 +1381,56 @@
|
||||
"### Regular expressions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The last SPARQL concept we will cover are [regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) (`regex`).\n",
|
||||
"Regular expressions are a very powerful tool, but we will only cover the basics in this exercise.\n",
|
||||
"\n",
|
||||
"In essence, regular expressions match strings against patterns.\n",
|
||||
"In their simplest form, they can be used to find substrings within a variable.\n",
|
||||
"For instance, using `regex(?label, \"substring\")` would only match if and only if the `?label` variable contains `substring`.\n",
|
||||
"But regular expressions can be more complex than that.\n",
|
||||
"For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.\n",
|
||||
"\n",
|
||||
"The syntax of the regex function is the following:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"regex(?variable, \"pattern\", \"flags\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).\n",
|
||||
"\n",
|
||||
"As an example, let us find the cities in Madrid that contain \"de\" in their name."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql\n",
|
||||
"\n",
|
||||
"SELECT ?localidad\n",
|
||||
"WHERE {\n",
|
||||
" ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid> .\n",
|
||||
" ?localidad rdfs:label ?nombre .\n",
|
||||
" FILTER (lang(?nombre) = \"es\" ).\n",
|
||||
" FILTER regex(?nombre, \"de\", \"i\")\n",
|
||||
"}\n",
|
||||
"LIMIT 10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, use regular expressions to find Spanish novelists whose **first name** is Juan.\n",
|
||||
"In other words, their name **starts with** \"Juan\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@ -1421,7 +1471,7 @@
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "71b5b187bb147c0e7444b29a4f413720",
|
||||
"checksum": "6632242d1d5055e12c3df37941b9e434",
|
||||
"grade": true,
|
||||
"grade_id": "cell-c149fe65008f39a9",
|
||||
"locked": true,
|
||||
@ -1434,7 +1484,8 @@
|
||||
"source": [
|
||||
"assert len(LAST_QUERY['columns']['nombre']) > 15\n",
|
||||
"for i in LAST_QUERY['columns']['nombre']:\n",
|
||||
" assert 'Juan' in i"
|
||||
" assert 'Juan' in i\n",
|
||||
"assert \"Robert Juan-Cantavella\" not in LAST_QUERY['columns']['nombre']"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1507,6 +1558,10 @@
|
||||
"Querying the manually annotated dataset is slightly different from querying DBpedia.\n",
|
||||
"The main difference is that this dataset uses different graphs to separate the annotations from different students.\n",
|
||||
"\n",
|
||||
"**Each graph is a separate set of triples**.\n",
|
||||
"For this exercise, you could think of graphs as individual endpoints.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"First, let us get a list of graphs available:"
|
||||
]
|
||||
},
|
||||
|
Loading…
Reference in New Issue
Block a user