mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-25 15:52:29 +00:00
1418 lines
38 KiB
Plaintext
1418 lines
38 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "markdown",
|
||
|
"checksum": "7276f055a8c504d3c80098c62ed41a4f",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-0bfe38f97f6ab2d2",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"<header style=\"width:100%;position:relative\">\n",
|
||
|
" <div style=\"width:80%;float:right;\">\n",
|
||
|
" <h1>Course Notes for Learning Intelligent Systems</h1>\n",
|
||
|
" <h3>Department of Telematic Engineering Systems</h3>\n",
|
||
|
" <h5>Universidad Politécnica de Madrid</h5>\n",
|
||
|
" </div>\n",
|
||
|
" <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
|
||
|
"</header>"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "markdown",
|
||
|
"checksum": "2387a77db61721ecc375a8275111ecaf",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-0cd673883ee592d1",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Introduction to Linked Open Data\n",
|
||
|
"\n",
|
||
|
"In this lecture, we will apply the same SPARQL concepts as in previous notebooks.\n",
|
||
|
"This time, instead of using a database specifically built for the exercise, we will be using DBpedia.\n",
|
||
|
"DBpedia is a semantic version of Wikipedia.\n",
|
||
|
"\n",
|
||
|
"The language we will use to query DBpedia is SPARQL, a semantic query language inspired by SQL.\n",
|
||
|
"For convenience, the examples in the notebook are executable, and they are accompanied by some code to test the results.\n",
|
||
|
"If the tests pass, you probably got the answer right.\n",
|
||
|
"\n",
|
||
|
"However, you can also use any other method to write and send your queries.\n",
|
||
|
"You may find online query editors particularly useful.\n",
|
||
|
"In addition to running queries from your browser, they provide useful features such as syntax highlighting and autocompletion.\n",
|
||
|
"Some examples are:\n",
|
||
|
"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "markdown",
|
||
|
"checksum": "bc0ca2e21254707344c60f895cb204b4",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-10264483046abcc4",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Objectives\n",
|
||
|
"\n",
|
||
|
"* Learning SPARQL and the Linked Data principles by defining queries to answer a set of problems of increasing difficulty\n",
|
||
|
"* Learning how to use integrated SPARQL editors and programming interfaces to SPARQL."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "markdown",
|
||
|
"checksum": "2fedf0d73fc90104d1ab72c3413dfc83",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-4f8492996e74bf20",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Tools\n",
|
||
|
"\n",
|
||
|
"See [the SPARQL notebook](./01_SPARQL_Introduction.ipynb#Tools)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Instructions\n",
|
||
|
"\n",
|
||
|
"As in previous notebooks, the exercises can be done in the notebook, using the `%%sparql` magic, and the set of tests.\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"After every query, you will find some python code to test the results of the query.\n",
|
||
|
"**Make sure you've run the tests before moving to the next exercise**.\n",
|
||
|
"If the test gives you an error, you've probably done something wrong.\n",
|
||
|
"You **do not need to understand or modify the test code**.\n",
|
||
|
"\n",
|
||
|
"If you prefer to edit your queries in a different editor, here are some options:\n",
|
||
|
"\n",
|
||
|
"* DBpedia's virtuoso query editor https://dbpedia.org/sparql\n",
|
||
|
"* A javascript based client hosted at GSI: http://yasgui.gsi.upm.es/\n",
|
||
|
"\n",
|
||
|
"If you use an editor, make sure to copy it to the notebook and run the tests, once you are getting the expected results."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "markdown",
|
||
|
"checksum": "c5f8646518bd832a47d71f9d3218237a",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-eb13908482825e42",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Run this line to enable the `%%sparql` magic command."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from helpers import sparql, solution, show_photos"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The `%%sparql` magic command will allow us to use SPARQL inside normal jupyter cells.\n",
|
||
|
"\n",
|
||
|
"For instance, the following code:\n",
|
||
|
"\n",
|
||
|
"```python\n",
|
||
|
"%%sparql\n",
|
||
|
"\n",
|
||
|
"<MY QUERY>\n",
|
||
|
"``` \n",
|
||
|
"\n",
|
||
|
"Is the same as `run_query('<MY QUERY>', endpoint='http://dbpedia.org/sparql')` plus some additional steps, such as saving the results in a nice table format so that they can be used later and storing the results in a variable (`solution()`), which we will use in our tests.\n",
|
||
|
"\n",
|
||
|
"You do not need to worry about it, and **you can always use one of the suggested online editors if you wish**."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Exercises"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### First Select"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Let's start with a simple query. We will get a list of cities and towns in Madrid.\n",
|
||
|
"If we take a look at the DBpedia ontology or the page of any town we already know, we discover that the property that links towns to their community is [`isPartOf`](http://dbpedia.org/ontology/isPartOf), and [the Community of Madrid is also a resource in DBpedia](http://dbpedia.org/resource/Community_of_Madrid)\n",
|
||
|
"\n",
|
||
|
"Since there are potentially many cities to get, we will limit our results to the first 10 results:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"SELECT ?localidad\n",
|
||
|
"WHERE {\n",
|
||
|
" ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid>\n",
|
||
|
"}\n",
|
||
|
"LIMIT 10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"However, that query is very verbose because we are using full URIs.\n",
|
||
|
"To simplify it, we will make use of SPARQL prefixes:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
|
||
|
"PREFIX dbr: <http://dbpedia.org/resource/>\n",
|
||
|
" \n",
|
||
|
"SELECT ?localidad\n",
|
||
|
"WHERE {\n",
|
||
|
" ?localidad dbo:isPartOf dbr:Community_of_Madrid.\n",
|
||
|
"}\n",
|
||
|
"LIMIT 10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"To make sure that the query returned something sensible, we can test it with some python code:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'localidad' in solution()['columns']\n",
|
||
|
"assert len(solution()['tuples']) == 10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now that you have some experience under your belt, it is time to design your own query.\n",
|
||
|
"\n",
|
||
|
"Your first task it to get a list of Spanish Novelits, using the skeleton below and the previous query to guide you.\n",
|
||
|
"\n",
|
||
|
"Pages for Spanish novelists are grouped in the *Spanish novelists* DBpedia category. You can use that fact to get your list.\n",
|
||
|
"In other words, the difference from the previous query will be using `dct:subject` instead of `dbo:isPartOf`, and `dbc:Spanish_novelists` instead of `dbr:Community_of_Madrid`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "eef1c62e2797bd3ef01f2061da6f83c4",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-7a9509ff3c34127e",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"LIMIT 10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "5a57d16cb2b53925f6e39fba429b7ef2",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-91240ded2cac7b6d",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert len(solution()['columns']) == 1 # We only use one variable, ?escritor\n",
|
||
|
"assert len(solution()['tuples']) == 10 # There should be 10 results"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Using more criteria\n",
|
||
|
"\n",
|
||
|
"We can get more than one property in the same query. Let us modify our query to get the population of the cities as well."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
|
||
|
"PREFIX dbr: <http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbp: <http://dbpedia.org/property/>\n",
|
||
|
" \n",
|
||
|
"SELECT ?localidad ?pop ?when\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
" ?localidad dbo:populationTotal ?pop .\n",
|
||
|
" ?localidad dbo:isPartOf dbr:Community_of_Madrid.\n",
|
||
|
" ?localidad dbp:populationAsOf ?when .\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'localidad' in solution()['columns']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Parla' in solution()['columns']['localidad']\n",
|
||
|
"assert ('http://dbpedia.org/resource/San_Sebastián_de_los_Reyes', '75912', '2009') in solution()['tuples']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Time to try it yourself.\n",
|
||
|
"\n",
|
||
|
"Get the list of Spanish novelists AND their name (using rdfs:label)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "9d4193612dea95da2d91762b638ad5e6",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-83dcaae0d09657b5",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor ?name\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"LIMIT 10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "86115c2a8982ad12b7250cf4341ae9c3",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-8afd28aada7a896c",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'escritor' in solution()['columns']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Eduardo_Mendoza_Garriga' in solution()['columns']['escritor']\n",
|
||
|
"assert ('http://dbpedia.org/resource/Eduardo_Mendoza_Garriga', 'Eduardo Mendoza') in solution()['tuples']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Filtering and ordering"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In the previous example, we saw that we got what seemed to be duplicated answers.\n",
|
||
|
"\n",
|
||
|
"This happens because entities can have labels in different languages (e.g. English, Spanish).\n",
|
||
|
"To restrict the search to only those results we're interested in, we can use filtering.\n",
|
||
|
"\n",
|
||
|
"We can also decide the order in which our results are shown.\n",
|
||
|
"\n",
|
||
|
"For instance, this is how we could use filtering to get only large cities in our example, ordered by population:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
|
||
|
"PREFIX dbr: <http://dbpedia.org/resource/>\n",
|
||
|
" \n",
|
||
|
"SELECT ?localidad ?pop ?when\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
" ?localidad dbo:populationTotal ?pop .\n",
|
||
|
" ?localidad dbo:isPartOf dbr:Community_of_Madrid.\n",
|
||
|
" ?localidad dbp:populationAsOf ?when .\n",
|
||
|
" FILTER(?pop > 100000)\n",
|
||
|
"}\n",
|
||
|
"ORDER BY ?pop\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Note that ordering happens before limits."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "a38cb1aea7b1f01f6b37c088384e0a3d",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-cb7b8283568cd349",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# We still have the biggest city\n",
|
||
|
"assert ('http://dbpedia.org/resource/Madrid', '3141991', '2014') in solution()['tuples']\n",
|
||
|
"# But the smaller ones are gone\n",
|
||
|
"assert 'http://dbpedia.org/resource/Tres_Cantos' not in solution()['columns']['localidad']\n",
|
||
|
"assert 'http://dbpedia.org/resource/San_Sebastián_de_los_Reyes' not in solution()['columns']['localidad']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now, try filtering to get a list of novelists and their name in Spanish, ordered by name `(FILTER (LANG(?nombre) = \"es\") y ORDER BY`"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "b6aaac8ab30d52a042c1efefbbff7550",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-ff3d611cb0304b01",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor ?nombre\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 1000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "3441fbd2267002acbb0d46d9ce94ba97",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-d70cc6ea394741bc",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert len(solution()['tuples']) >= 50\n",
|
||
|
"assert 'Adelaida García Morales' in solution()['columns']['nombre']\n",
|
||
|
"assert sum(1 for k in solution()['columns']['escritor'] if k == 'http://dbpedia.org/resource/Adelaida_García_Morales') == 1"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Optional"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"From now on, we will focus on our Writers example.\n",
|
||
|
"\n",
|
||
|
"First, we will search for writers born in the XX century, using the [20th-century Spanish novelists](http://dbpedia.org/page/Category:20th-century_Spanish_novelists) category."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "e3ff089c983be1ae937f254b8d9d229a",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-ab7755944d46f9ca",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# YOUR ANSWER HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "cacdd08a8a267c1173304e319ffff563",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-cf3821f2d33fb0f6",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Camilo José Cela' in solution()['columns']['nombre']\n",
|
||
|
"assert 'Javier Marías' in solution()['columns']['nombre']\n",
|
||
|
"assert all(x > '1850-12-31' and x < '2001-01-01' for x in solution()['columns']['nac'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In our last example, we were missing many novelists that are do not have birth information in DBpedia.\n",
|
||
|
"\n",
|
||
|
"We can specify optional values in a query using the `OPTIONAL` keyword.\n",
|
||
|
"When a set of clauses are inside an OPTIONAL group, the SPARQL endpoint will try to use them in the query.\n",
|
||
|
"If there are no results for that part of the query, the variables it specifies will not be bound (i.e. they will be empty).\n",
|
||
|
"\n",
|
||
|
"Using that, let us retrieve all the novelists, their birth and death date (if they are available)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "f4170cbbf042644e394d1eb9acf12ce3",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-254a18dd973e82ed",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor ?nombre ?fechaNac ?fechaDef\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 200"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "29c6362adbdb5606e158f696594e1052",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-4d6a64dde67f0e11",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Wenceslao Fernández Flórez' in solution()['columns']['nombre']\n",
|
||
|
"assert '1879-2-11' in solution()['columns']['fechaNac']\n",
|
||
|
"assert '' in solution()['columns']['fechaNac'] # Not all birthdates are defined\n",
|
||
|
"assert '' in solution()['columns']['fechaDef'] # Some deathdates are not defined"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Bound"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We can check whether the optional value for a key was bound in a SPARQL query using `BOUND(?key)`.\n",
|
||
|
"\n",
|
||
|
"This is very useful for two purposes.\n",
|
||
|
"First, it allows us to look for patterns that **do not occur** in the graph, such as missing properties.\n",
|
||
|
"For instance, we could search for the authors with missing birth information so we can add it.\n",
|
||
|
"Secondly, we can use bound in filters to get conditional filters.\n",
|
||
|
"We will explore both uses in this exercise."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Get the list of Spanish novelists that are still alive.\n",
|
||
|
"A person is alive if their death date is not defined and the were born less than 100 years ago"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "f3c11121eb0d1328d2f5da3580f8d648",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-474b1a72dec6827c",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
" \n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 1000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "770bbddef5210c28486a1929e4513ada",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-46b62dd2856bc919",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Fernando Arrabal' in solution()['columns']['nombre']\n",
|
||
|
"assert 'Albert Espinosa' in solution()['columns']['nombre']\n",
|
||
|
"for year in solution()['columns']['nac']:\n",
|
||
|
" assert int(year) >= 1918"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now, get the list of Spanish novelists that died before their fifties (i.e. younger than 50 years old), or that aren't 50 years old yet.\n",
|
||
|
"\n",
|
||
|
"Hint: you can use boolean logic in your filters (e.g. `&&` and `||`).\n",
|
||
|
"\n",
|
||
|
"Hint 2: Some dates are not formatted properly, which makes some queries fail when they shouldn't. You might need to convert between different types as a workaround. For instance, you could get the year from a date like this: `year(xsd:dateTime(str(?date)))`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "ed34857649c9a6926eb0a3a0e1d8198d",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-ceefd3c8fbd39d79",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor, ?nombre, YEAR(?fechaNac) as ?nac, ?fechaDef\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "18bb2d8d586bf4a5231973e69958ab75",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-461cd6ccc6c2dc79",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Javier Sierra' in solution()['columns']['nombre']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Sanmao_(author)' in solution()['columns']['escritor']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Finding unique elements"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In our last example, our results show some authors more than once.\n",
|
||
|
"This is because some properties are defined more than once.\n",
|
||
|
"For instance, birth date is giving using different formats.\n",
|
||
|
"Even if we exclude that property from our results by not adding it in our `SELECT`, we will get duplicated lines.\n",
|
||
|
"\n",
|
||
|
"To solve this, we can use the `DISTINCT` keyword."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Modify your last query to remove duplicated lines.\n",
|
||
|
"In other words, authors should only appear once."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "34163ddb0400cd8ddd2c2e2cdf29c20b",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-2a39adc71d26ae73",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT DISTINCT ?escritor, ?nombre, year(?fechaNac) as ?nac\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "84ab7d64a45e03e6dd902216a2aad030",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-542e0e36347fd5d1",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Javier Sierra' in solution()['columns']['nombre']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Albert_Espinosa' in solution()['columns']['escritor']\n",
|
||
|
"\n",
|
||
|
"from collections import Counter\n",
|
||
|
"c = Counter(solution()['columns']['nombre'])\n",
|
||
|
"for count in c.values():\n",
|
||
|
" assert count == 1\n",
|
||
|
" \n",
|
||
|
"c1 = Counter(solution()['columns']['escritor'])\n",
|
||
|
"assert all(count==1 for count in c1.values())\n",
|
||
|
"# c = Counter(solution()['columns']['nombre'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Using other resources"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Get the list of living Spanish novelists born in Madrid.\n",
|
||
|
"\n",
|
||
|
"Hint: use `dbr:Madrid` and `dbo:birthPlace`"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "25c8edcee216d536aac98fc9aa2b6422",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-d175e41da57c889b",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbr:<http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT DISTINCT ?escritor, ?nombre, ?lugarNac, year(?fechaNac) as ?nac\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "443608e177f514f2cddafa6c1d1e3cc7",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-fadd095862db6bc8",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'José Ángel Mañas' in solution()['columns']['nombre']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Madrid' in solution()['columns']['lugarNac']\n",
|
||
|
"MADRID_QUERY = solution()['columns'].copy()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Traversing the graph"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Get the list of works of the authors in the previous query (i.e. authors born in Madrid), if they have any.\n",
|
||
|
"\n",
|
||
|
"Hint: use `dbo:author`, which is a **property of a literary work** that points to the author."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "c1f22b82c4d0bd4102a6c38f7f933dc6",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-e4b99af9ef91ff6f",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbr:<http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT DISTINCT ?escritor, ?nombre, ?obra\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 10000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "51acaeb26379c6bd2f8c767001ef79ec",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-68661b73c2140e4f",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'http://dbpedia.org/resource/A_Heart_So_White' in solution()['columns']['obra']\n",
|
||
|
"assert 'http://dbpedia.org/resource/Tomorrow_in_the_Battle_Think_on_Me' in solution()['columns']['obra']\n",
|
||
|
"assert '' in solution()['columns']['obra'] # Some authors don't have works in dbpedia"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Traversing the graph"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Get a list of living Spanish novelists born in Madrid, their name in Spanish, a link to their foto and a website (if they have one).\n",
|
||
|
"\n",
|
||
|
"If the query is right, you should see a list of writers after running the test code.\n",
|
||
|
"\n",
|
||
|
"Hint: `foaf:depiction` and `foaf: homepage`"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "e3f8e18a006a763f5cdbe49c97b73f5f",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-b1f71c67dd71dad4",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbr:<http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT ?escritor ?web ?foto\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"ORDER BY ?nombre\n",
|
||
|
"LIMIT 100"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "2d40e7ceb7774b29a709092ee8dfa9f5",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-8b8ba7cca701c652",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"fotos = set(filter(lambda x: x != '', solution()['columns']['foto']))\n",
|
||
|
"assert len(fotos) > 2\n",
|
||
|
"show_photos(fotos) #show the pictures of the writers!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Union"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We can merge the results of several queries, just like using `JOIN` in SQL.\n",
|
||
|
"The keyword in SPARQL is `UNION`, because we are merging graphs.\n",
|
||
|
"\n",
|
||
|
"`UNION` is useful in many situations.\n",
|
||
|
"For instance, when there are equivalent properties, or when you want to use two search terms and FILTER would be too inefficient.\n",
|
||
|
"\n",
|
||
|
"The syntax is as follows:\n",
|
||
|
"\n",
|
||
|
"```sparql\n",
|
||
|
"SELECT ?title\n",
|
||
|
"WHERE {\n",
|
||
|
" { ?book dc10:title ?title }\n",
|
||
|
" UNION\n",
|
||
|
" { ?book dc11:title ?title }\n",
|
||
|
" \n",
|
||
|
" ... REST OF YOUR QUERY ...\n",
|
||
|
"\n",
|
||
|
"}\n",
|
||
|
"```\n",
|
||
|
"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Using UNION, get a list of distinct spanish novelists AND poets.\n",
|
||
|
"\n",
|
||
|
"Hint: Category: Spanish_poets"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "9c0da379841474601397f5623abc6a9c",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-21eb6323b6d0011d",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbr:<http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"SELECT DISTINCT ?escritor, ?nombre\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"LIMIT 10000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "f22c7db423410fcf3e8fce4ec0a8e9f9",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-004e021e877c6ace",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert 'Garcilaso de la Vega' in solution()['columns']['nombre']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"You can also get the count of results either by inspecting the result (we will not cover this) or by aggregating the results using the `COUNT` operation.\n",
|
||
|
"\n",
|
||
|
"The syntax is:\n",
|
||
|
" \n",
|
||
|
"```sparql\n",
|
||
|
"SELECT COUNT(?variable) as ?count_name\n",
|
||
|
"```\n",
|
||
|
"\n",
|
||
|
"Try it yourself with our previous example:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "cd7ce9212f587afe311c7631b3908de2",
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-e35414e191c5bf16",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%%sparql http://dbpedia.org/sparql\n",
|
||
|
"\n",
|
||
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||
|
"PREFIX dct:<http://purl.org/dc/terms/>\n",
|
||
|
"PREFIX dbc:<http://dbpedia.org/resource/Category:>\n",
|
||
|
"PREFIX dbr:<http://dbpedia.org/resource/>\n",
|
||
|
"PREFIX dbo:<http://dbpedia.org/ontology/>\n",
|
||
|
"\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"\n",
|
||
|
"WHERE {\n",
|
||
|
"# YOUR ANSWER HERE\n",
|
||
|
"}\n",
|
||
|
"LIMIT 10000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"deletable": false,
|
||
|
"editable": false,
|
||
|
"nbgrader": {
|
||
|
"cell_type": "code",
|
||
|
"checksum": "68609fa02dcf7480e16f0e5eb7849e65",
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-7a7ef8255a5662e2",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert len(solution()['columns']) == 1\n",
|
||
|
"column_name = list(solution()['columns'].keys())[0]\n",
|
||
|
"assert int(solution()['columns'][column_name][0]) > 200"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Additional exercises"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Find out if there are more dbpedia entries for writers (dbo:Writer) than for football players (dbo:SoccerPlayers)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Get a list of European countries with a population higher than 20 million, in decreasing order of population, including their URI, name in English and population."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Find the country in the world that speaks the most languages. Show its name in Spanish, if available."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## References"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"* [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).\n",
|
||
|
"* [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Licence\n",
|
||
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||
|
"\n",
|
||
|
"© 2018 Universidad Politécnica de Madrid."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.8.1"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 2
|
||
|
}
|