Fix SPARQL regex exercise

2026-06-18 21:01:59 +00:00 · 2018-03-20 16:46:52 +01:00
parent e5fa77a128
commit bb2e3c2fe4
1 changed files with 57 additions and 2 deletions
--- a/lod/SPARQL.ipynb
+++ b/lod/SPARQL.ipynb
@@ -1381,6 +1381,56 @@
    "### Regular expressions"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The last SPARQL concept we will cover are [regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) (`regex`).\n",
+    "Regular expressions are a very powerful tool, but we will only cover the basics in this exercise.\n",
+    "\n",
+    "In essence, regular expressions match strings against patterns.\n",
+    "In their simplest form, they can be used to find substrings within a variable.\n",
+    "For instance, using `regex(?label, \"substring\")` would only match if and only if the `?label` variable contains `substring`.\n",
+    "But regular expressions can be more complex than that.\n",
+    "For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.\n",
+    "\n",
+    "The syntax of the regex function is the following:\n",
+    "\n",
+    "```\n",
+    "regex(?variable, \"pattern\", \"flags\")\n",
+    "```\n",
+    "\n",
+    "Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).\n",
+    "\n",
+    "As an example, let us find the cities in Madrid that contain \"de\" in their name."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sparql\n",
+    "\n",
+    "SELECT ?localidad\n",
+    "WHERE {\n",
+    "    ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid> .\n",
+    "    ?localidad rdfs:label ?nombre .\n",
+    "    FILTER (lang(?nombre) = \"es\" ).\n",
+    "    FILTER regex(?nombre, \"de\", \"i\")\n",
+    "}\n",
+    "LIMIT 10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, use regular expressions to find Spanish novelists whose **first name** is Juan.\n",
+    "In other words, their name **starts with** \"Juan\"."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -1421,7 +1471,7 @@
    "deletable": false,
    "editable": false,
    "nbgrader": {
-     "checksum": "71b5b187bb147c0e7444b29a4f413720",
+     "checksum": "6632242d1d5055e12c3df37941b9e434",
     "grade": true,
     "grade_id": "cell-c149fe65008f39a9",
     "locked": true,
@@ -1434,7 +1484,8 @@
   "source": [
    "assert len(LAST_QUERY['columns']['nombre']) > 15\n",
    "for i in LAST_QUERY['columns']['nombre']:\n",
-    "    assert 'Juan' in i"
+    "    assert 'Juan' in i\n",
+    "assert \"Robert Juan-Cantavella\" not in LAST_QUERY['columns']['nombre']"
   ]
  },
  {
@@ -1507,6 +1558,10 @@
    "Querying the manually annotated dataset is slightly different from querying DBpedia.\n",
    "The main difference is that this dataset uses different graphs to separate the annotations from different students.\n",
    "\n",
+    "**Each graph is a separate set of triples**.\n",
+    "For this exercise, you could think of graphs as individual endpoints.\n",
+    "\n",
+    "\n",
    "First, let us get a list of graphs available:"
   ]
  },