LOD: minor changes

pull/6/merge
J. Fernando Sánchez 3 years ago
parent 5144b7f228
commit b43125ca59

@ -27,13 +27,15 @@
"source": [
"## Introduction\n",
"\n",
"This lecture provides an introduction to RDF and the query language SPARQL.\n",
"This lecture provides an introduction to RDF and the SPARQL query language.\n",
"\n",
"This is the first in a series of notebooks about SPARQL, which consists of:\n",
"\n",
"* This notebook, which basic concepts of RDF and SPARQL\n",
"* [A notebook](01_SPARQL_Introduction.ipynb) that provides an introduction of SPARQL through a collection of progressively more difficult exercises]\n",
"* [A notebook](02_SPARQL_Custom_Endpoint.ipynb) with queries to a custom dataset, which links to the RDF exercises and it is out of the scope of this course. You can consult it if you are interested."
"* This notebook, which explains basic concepts of RDF and SPARQL\n",
"* [A notebook](01_SPARQL_Introduction.ipynb) that provides an introduction to SPARQL through a collection of exercises of increasing difficulty.\n",
"* [An optional notebook](02_SPARQL_Custom_Endpoint.ipynb) with queries to a custom dataset.\n",
"The dataset is meant to be done after the [RDF exercises](../rdf/RDF.ipynb) and it is out of the scope of this course.\n",
"You can consult it if you are interested."
]
},
{
@ -47,14 +49,17 @@
"\n",
" <subject> <predicate> <object>\n",
" \n",
"An RDF statement expresses a relationship between two resources. The **subject** and the **object** represent the two resources being related; the **predicate** represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a **property**. Because RDF statements consist of three elements they are called **triples**.\n",
"An RDF statement expresses a relationship between two resources. The **subject** and the **object** represent the two resources being related; the **predicate** represents the nature of their relationship.\n",
"The relationship is phrased in a directional way (from subject to object).\n",
"In RDF this relationship is known as a **property**.\n",
"Because RDF statements consist of three elements they are called **triples**.\n",
"\n",
"Here are examples of RDF triples (informally expressed in pseudocode):\n",
"Here are some examples of RDF triples (informally expressed in pseudocode):\n",
"\n",
" <Bob> <is a> <person>.\n",
" <Bob> <is a friend of> <Alice>.\n",
" \n",
"Resources are identified by IRIs, which can appear in all three positions of a triple. For example, the IRI for Leonardo da Vinci in DBpedia is:\n",
"Resources are identified by [IRIs](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier), which can appear in all three positions of a triple. For example, the IRI for Leonardo da Vinci in DBpedia is:\n",
"\n",
" <http://dbpedia.org/resource/Leonardo_da_Vinci>\n",
"\n",
@ -69,7 +74,9 @@
" * booleans\n",
" * etc.\n",
" \n",
"RDF data is stored in RDF repositories that expose SPARQL endpoints. Let's query one of the most famous RDF repositories: dbpedia. First, we should learn how to execute SPARQL in a notebook."
"RDF data is stored in RDF repositories that expose SPARQL endpoints.\n",
"Let's query one of the most famous RDF repositories: [dbpedia](https://wiki.dbpedia.org/).\n",
"First, we should learn how to execute SPARQL in a notebook."
]
},
{
@ -77,30 +84,34 @@
"metadata": {},
"source": [
"# Executing SPARQL in a notebook\n",
"There are several ways to execute SPARQL in a notebook. The most popular ones are:\n",
"* using a SPARQL kernel [sparql kernel](https://github.com/paulovn/sparql-kernel) instead of the Python3 kernel\n",
"* using the [graph notebook package](https://pypi.org/project/graph-notebook/)\n",
"There are several ways to execute SPARQL in a notebook.\n",
"Some of the most popular are:\n",
"\n",
"* using libraries such as [sparql-client](https://pypi.org/project/sparql-client/) or [rdflib](https://rdflib.dev/sparqlwrapper/) that enable executing SPARQL within a Python3 kernel\n",
"* using other libraries. In our case, a light library has been developed (the file helpers.py) for accessing SPARQL endpoints using an HTTP connection.\n",
"* using the [graph notebook package](https://pypi.org/project/graph-notebook/)\n",
"* using a SPARQL kernel [sparql kernel](https://github.com/paulovn/sparql-kernel) instead of the Python3 kernel\n",
"\n",
"We are going to use the last option to avoid installing new packages.\n",
"\n",
"For using the library, you need:\n",
"1. Import sparql from helpers (the file helpers.py available in the github repository)\n",
"2. Use the magic command '%%sparql' to indicate the SPARQL endpoint and then the SPARQL code.\n",
"We are going to use the second option to avoid installing new packages.\n",
"\n",
"To use the library, you need to:\n",
"\n",
"1. Import `sparql` from helpers (i.e., `helpers.py`, a file that is available in the github repository)\n",
"2. Use the `%%sparql` magic command to indicate the SPARQL endpoint and then the SPARQL code.\n",
"\n",
"Let's try it!\n",
"\n",
"# Queries agains DBPedia\n",
"\n",
"We are going to execute an SPARQL query agains DBPedia. This section is based on [[8](#8)].\n",
"We are going to execute a SPARQL query against DBPedia. This section is based on [[8](#8)].\n",
"\n",
"First, we just create a query to retrieve arbitrary triples (subject, predicate, object) without any restriction (only that we want to limit to 10 results)."
"First, we just create a query to retrieve arbitrary triples (subject, predicate, object) without any restriction (besides limiting the result to 10 triples)."
]
},
{
"cell_type": "code",
"execution_count": 44,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -109,23 +120,9 @@
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>s</th><th>p</th><th>o</th><tr></thead><tbody><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default-iid</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#default-nullable</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#sql-varchar</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-dt</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr><tr><td>http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-dt-nullable</td><td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td><td>http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://live.dbpedia.org/sparql\n",
"\n",
@ -141,28 +138,16 @@
"metadata": {},
"source": [
"Well, it worked, but the results are not particulary interesting. \n",
"Let's search for a famous football player, Fernando Torres."
"Let's search for a famous football player, Fernando Torres.\n",
"\n",
"To do so, we will search for entities whose English \"human-readable representation\" (i.e., label) matches \"Fernando Torres\":"
]
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>athlete</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Fernando_Torres</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://live.dbpedia.org/sparql\n",
"\n",
@ -177,30 +162,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazing. Go to http://dbpedia.org/resource/Fernando_Torres and you will see all the information available about Fernando Torres. Pay attention to the names of predicates to be able to create new queries. For example, we are interesting in knowing where Fernando Torres was born.\n",
"Great, we found the IRI of the node: `http://dbpedia.org/resource/Fernando_Torres`\n",
"\n",
"Now we can start asking for more properties.\n",
"\n",
"To do so, go to http://dbpedia.org/resource/Fernando_Torres and you will see all the information available about Fernando Torres. Pay attention to the names of predicates to be able to create new queries. For example, we are interesting in knowing where Fernando Torres was born (`dbo:birthPlace`).\n",
"\n",
"Let's go!"
]
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>athlete</th><th>birthPlace</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Spain_national_football_team</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -218,37 +193,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Observe the SPARQL query:\n",
"* PREFIX section URIs of vocabularies and the prefix used below, to avoid long IRIs\n",
"* SELECT section: variables we want to return (* is an abbreviation that selects all of the variables in a query)\n",
"* WHERE triple pattern: triples where some elements are variables. These variables are bound during the query processing process and bounded variables are returned.\n",
"If we examine the SPARQL query, we find three blocks:\n",
"\n",
"* **PREFIX** section: IRIs of vocabularies and the prefix used below, to avoid long IRIs. e.g., by defining the `dbo` prefix in our example, the `dbo:birthPlace` below expands to `http://dbpedia.org/ontology/birthPlace`.\n",
"* **SELECT** section: variables we want to return (`*` is an abbreviation that selects all of the variables in a query)\n",
"* **WHERE** clause: triples where some elements are variables. These variables are bound during the query processing process and bounded variables are returned.\n",
"\n",
"Pay attention to the WHERE section. Since both triple patterns share the same subject, we omit it in the second one, and link both with \" ;\". Each triple pattern should finish with a \" .\" (the last pattern can omit this). Don't forget the space before \";\" and \".\".\n",
"Now take a closer look at the **WHERE** section.\n",
"We said earlier that triples are made out of three elements and each triple pattern should finish with a period (`.`) (although the last pattern can omit this).\n",
"However, when two or more triple patterns share the same subject, we omit it all but the first one, and use ` ;` as separator.\n",
"If if both the subject and predicate are the same, we could use a coma `,` instead.\n",
"This allows us to avoid repetition and make queries more readable.\n",
"But don't forget the space before your separators (`;` and `.`).\n",
"\n",
"The result is interesting, we know he was born in Fuenlabrada, but we see an additional (wrong) value, the Spanish national football team. The conversion process from Wikipedia to DBPedia should still be tuned :).\n",
"\n",
"We can 'fix' it, by adding some information. We want only municipalities as a result. Let's see!\n"
"We can *fix* it, by adding some more constaints.\n",
"In our case, only want a birth place that is also a municipality (i.e., its type is `http://dbpedia.org/resource/Municipalities_of_Spain`).\n",
"Let's see!"
]
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>athlete</th><th>birthPlace</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -269,30 +238,20 @@
"metadata": {},
"source": [
"Great. Now it looks better.\n",
"Do you know of Fuenlabrada is a big city? Let's query!\n",
"Notice that we added a new prefix.\n",
"\n",
"Hint: search (as previously) the subject / object / predicate nodes in the RDF graph (http://dbpedia.org/resource/Fuenlabrada).\n"
"Now, is Fuenlabrada is a big city?\n",
"Let's find out.\n",
"\n",
"**Hint**: you can find more subject / object / predicate nodes related to [Fuenlabrada])http://dbpedia.org/resource/Fuenlabrada) in the RDF graph just as we did before.\n",
"That is how we found the `dbo:areaTotal` property."
]
},
{
"cell_type": "code",
"execution_count": 49,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>area</th><tr></thead><tbody><tr><td>3.941e+07</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -310,28 +269,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, it shows 39.1 km$^2$. Let's go back to know more about Fernando Torres. We would want to retrieve the name of the city where he was born instead of the IRI. Let's try!"
"Well, it shows 39.1 km$^2$.\n",
"\n",
"Let's go back to our Fernando Torres.\n",
"What we are really insterested in is the name of the city he was born in, not its IRI.\n",
"As we saw before, the human-readable name is provided by the `rdfs:label` property:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>player</th><th>birthPlace</th><th>placeName</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>فوينلابرادا</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>フエンラブラダ</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>푸엔라브라다</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Фуэнлабрада</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Фуенлабрада</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>富恩拉夫拉达</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -358,23 +307,9 @@
},
{
"cell_type": "code",
"execution_count": 53,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>player</th><th>birthPlace</th><th>placeName</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Fernando_Torres</td><td>http://dbpedia.org/resource/Fuenlabrada</td><td>Fuenlabrada</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -397,28 +332,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Awesome. Let's tune a bit more. We only want two results: Fernando's birth date and birth place (name). Let's go!"
"Awesome!\n",
"\n",
"But we said we don't care about the IRI of the place. We only want two pieces of data: Fernando's birth date and the name of his birthplace.\n",
"\n",
"Let's tune our query a bit more."
]
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>birthDate</th><th>placeName</th><tr></thead><tbody><tr><td>1984-03-20</td><td>Fuenlabrada</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -439,33 +364,19 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"Great :). Are there many football players born in Fuenlabrada? Let's query!"
"Great 😃\n",
"\n",
"Are there many football players born in Fuenlabrada? Let's find out!"
]
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>player</th><tr></thead><tbody><tr><td>http://dbpedia.org/resource/Luismi_(footballer,_born_1979)</td></tr><tr><td>http://dbpedia.org/resource/Óscar_Miñambres</td></tr><tr><td>http://dbpedia.org/resource/Tachi_(footballer)</td></tr><tr><td>http://dbpedia.org/resource/Fernando_Torres</td></tr></tbody></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
@ -476,8 +387,7 @@
"WHERE\n",
" {\n",
" ?player a dbo:SoccerPlayer ; \n",
" dbo:birthPlace dbr:Fuenlabrada .\n",
" \n",
" dbo:birthPlace dbr:Fuenlabrada . \n",
" }"
]
},
@ -485,7 +395,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, not that many. Observe we have used 'a' (it is an abbreviation for rdf:type, both can be used).\n",
"Well, not that many. Observe we have used `a`.\n",
"It is just an abbreviation for `rdf:type`, both can be used interchangeably.\n",
"\n",
"If you want additional examples, you can follow the notebook by [Shawn Graham](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb), which is based on the SPARQL tutorial by Matthew Lincoln, available [here in English](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL) and [here in Spanish](https://programminghistorian.org/es/lecciones/retirada/sparql-datos-abiertos-enlazados]). You have also a local copy of these tutorials together with this notebook [here in English](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/graph-databases-and-SPARQL.html) and [here in Spanish](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/sparql-datos-abiertos-enlazados.html). \n"
]
@ -498,7 +409,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -549,7 +459,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.9.1"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,

Loading…
Cancel
Save