diff --git a/lod/SPARQL.ipynb b/lod/SPARQL.ipynb index b4e9222..aad8f51 100644 --- a/lod/SPARQL.ipynb +++ b/lod/SPARQL.ipynb @@ -1381,6 +1381,56 @@ "### Regular expressions" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The last SPARQL concept we will cover are [regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) (`regex`).\n", + "Regular expressions are a very powerful tool, but we will only cover the basics in this exercise.\n", + "\n", + "In essence, regular expressions match strings against patterns.\n", + "In their simplest form, they can be used to find substrings within a variable.\n", + "For instance, using `regex(?label, \"substring\")` would only match if and only if the `?label` variable contains `substring`.\n", + "But regular expressions can be more complex than that.\n", + "For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.\n", + "\n", + "The syntax of the regex function is the following:\n", + "\n", + "```\n", + "regex(?variable, \"pattern\", \"flags\")\n", + "```\n", + "\n", + "Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).\n", + "\n", + "As an example, let us find the cities in Madrid that contain \"de\" in their name." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%sparql\n", + "\n", + "SELECT ?localidad\n", + "WHERE {\n", + " ?localidad .\n", + " ?localidad rdfs:label ?nombre .\n", + " FILTER (lang(?nombre) = \"es\" ).\n", + " FILTER regex(?nombre, \"de\", \"i\")\n", + "}\n", + "LIMIT 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, use regular expressions to find Spanish novelists whose **first name** is Juan.\n", + "In other words, their name **starts with** \"Juan\"." + ] + }, { "cell_type": "code", "execution_count": null, @@ -1421,7 +1471,7 @@ "deletable": false, "editable": false, "nbgrader": { - "checksum": "71b5b187bb147c0e7444b29a4f413720", + "checksum": "6632242d1d5055e12c3df37941b9e434", "grade": true, "grade_id": "cell-c149fe65008f39a9", "locked": true, @@ -1434,7 +1484,8 @@ "source": [ "assert len(LAST_QUERY['columns']['nombre']) > 15\n", "for i in LAST_QUERY['columns']['nombre']:\n", - " assert 'Juan' in i" + " assert 'Juan' in i\n", + "assert \"Robert Juan-Cantavella\" not in LAST_QUERY['columns']['nombre']" ] }, { @@ -1507,6 +1558,10 @@ "Querying the manually annotated dataset is slightly different from querying DBpedia.\n", "The main difference is that this dataset uses different graphs to separate the annotations from different students.\n", "\n", + "**Each graph is a separate set of triples**.\n", + "For this exercise, you could think of graphs as individual endpoints.\n", + "\n", + "\n", "First, let us get a list of graphs available:" ] },