1
0
mirror of https://github.com/gsi-upm/sitc synced 2026-02-19 04:38:16 +00:00

Updated server install

This commit is contained in:
cif
2026-02-18 17:20:14 +01:00
parent f10e6650c9
commit d66bb25245

View File

@@ -1,12 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -14,10 +7,10 @@
"<header style=\"width:100%;position:relative\">\n",
" <div style=\"width:80%;float:right;\">\n",
" <h1>Course Notes for Learning Intelligent Systems</h1>\n",
" <h3>Department of Telematic Engineering Systems</h3>\n",
" <h5>Universidad Politécnica de Madrid. © Carlos A. Iglesias </h5>\n",
" <h2>Department of Telematic Engineering Systems</h2>\n",
" <h3>Universidad Politécnica de Madrid. © Carlos A. Iglesias </h3>\n",
" </div>\n",
" <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
" <img style=\"width:15%;\" src=\"https://github.com/gsi-upm/sitc/blob/9844820e6653b0e169113a06538f8e54554c4fbc/images/EscUpmPolit_p.gif?raw=true\" alt=\"UPM\" />\n",
"</header>"
]
},
@@ -27,378 +20,65 @@
"source": [
"## Introduction\n",
"\n",
"This lecture provides an introduction to RDF and the SPARQL query language.\n",
"\n",
"This is the first in a series of notebooks about SPARQL, which consists of:\n",
"\n",
"* This notebook, which explains basic concepts of RDF and SPARQL\n",
"* [A notebook](01_SPARQL_Introduction.ipynb) that provides an introduction to SPARQL through a collection of exercises of increasing difficulty.\n",
"* [An optional notebook](02_SPARQL_Custom_Endpoint.ipynb) with queries to a custom dataset.\n",
"The dataset is meant to be done after the [RDF exercises](../rdf/RDF.ipynb), and it is out of the scope of this course.\n",
"You can consult it if you are interested."
"This lecture explains how to run the SPARQL Server Fuseki using Docker."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RDF basics\n",
"# Installing Fuseki with Docker\n",
"This section is taken from [[1](#1), [2](#2)].\n",
"\n",
"RDF allows us to make statements about resources. The format of these statements is simple. A statement always has the following structure:\n",
"## Install Docker if not installed\n",
"You should have installed **Docker**. If not, refer to the [docker install guide](https://docs.docker.com/engine/install/).\n",
"\n",
" <subject> <predicate> <object>\n",
" \n",
"An RDF statement expresses a relationship between two resources. The **subject** and the **object** represent the two resources being related; the **predicate** represents the nature of their relationship.\n",
"The relationship is phrased in a directional way (from subject to object).\n",
"In RDF, this relationship is known as a **property**.\n",
"Because RDF statements consist of three elements, they are called **triples**.\n",
"## Install fuseki image\n",
"In a terminal, run\n",
"```\n",
"docker run -p 3030:3030 --name fuseki -e ADMIN_PASSWORD=fuseki -it stain/jena-fuseki\n",
"```\n",
"You can change the admin password to anyone that you want, or the ports.\n",
"\n",
"Here are some examples of RDF triples (informally expressed in pseudocode):\n",
"You should see the logs in the terminal.\n",
"\n",
" <Bob> <is a> <person>.\n",
" <Bob> <is a friend of> <Alice>.\n",
" \n",
"Resources are identified by [IRIs](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier), which can appear in all three positions of a triple. For example, the IRI for Leonardo da Vinci in DBpedia is:\n",
"![alt text](docker-run-jena-fuseki.jpg \"Fuseki running in Docker\")\n",
"\n",
" <http://dbpedia.org/resource/Leonardo_da_Vinci>\n",
"## Access admin UI\n",
"Now open a browser to [http://localhost:3030](http://localhost:3030) and log in as user *admin* with password *fuseki* (or the password you set earlier).\n",
"\n",
"IRIs can be abbreviated as *prefixed names*. For example, \n",
" PREFIX dbr: <http://dbpedia.org/resource/>\n",
" <dbr:Leonardo_da_Vinci>\n",
" \n",
"Objects can be literals: \n",
" * strings (e.g., \"plain string\" or \"string with language\"@en)\n",
" * numbers (e.g., \"13.4\"^^xsd:float)\n",
" * dates (e.g., )\n",
" * booleans\n",
" * etc.\n",
" \n",
"RDF data is stored in RDF repositories that expose SPARQL endpoints.\n",
"Let's query one of the most famous RDF repositories: [dbpedia](https://wiki.dbpedia.org/).\n",
"First, we should learn how to execute SPARQL in a notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Executing SPARQL in a notebook\n",
"There are several ways to execute SPARQL in a notebook.\n",
"Some of the most popular are:\n",
"![alt text](fuseki-running.jpg \"UI Admin Fuseki\")\n",
"\n",
"* using libraries such as [sparql-client](https://pypi.org/project/sparql-client/) or [rdflib](https://rdflib.dev/sparqlwrapper/) that enable executing SPARQL within a Python3 kernel\n",
"* using other libraries. In our case, a light library has been developed (the file helpers.py) for accessing SPARQL endpoints using an HTTP connection.\n",
"* using the [graph notebook package](https://pypi.org/project/graph-notebook/)\n",
"* using a SPARQL kernel [sparql kernel](https://github.com/paulovn/sparql-kernel) instead of the Python3 kernel\n",
"## Load data\n",
"Download this dataset to your computer.\n",
"\n",
"[Beatles dataset](https://github.com/gsi-upm/sitc/blob/master/lod/BeatlesMusicians.ttl).\n",
"\n",
"At the bottom of the UI, you can see 'No datasets created - add one'. Click on *add one*. Set *beatles* as the dataset name and in-memory as the dataset type.\n",
"\n",
"![alt text](new-dataset.jpg \"create dataset\")\n",
"\n",
"\n",
"We are going to use the second option to avoid installing new packages.\n",
"Click the *create dataset* button.\n",
"\n",
"To use the library, you need to:\n",
"![alt text](created-dataset.jpg \"created dataset\")\n",
"\n",
"1. Import `sparql` from helpers (i.e., `helpers.py`, a file that is available in the GitHub repository)\n",
"2. Use the `%%sparql` magic command to indicate the SPARQL endpoint and then the SPARQL code.\n",
"Click the *add data* button. In the new screen, select the button *select files* and click on the file you have previously downloaded, and click on the *upload now* button.\n",
"\n",
"Let's try it!\n",
"![alt text](upload-data.jpg \"upload data\")\n",
"\n",
"# Queries agains DBPedia\n",
"Now the *query* tab.\n",
"\n",
"We are going to execute a SPARQL query against DBPedia. This section is based on [[8](#8)].\n",
"![alt text](query-ui.jpg \"query ui\")\n",
"\n",
"First, we just create a query to retrieve arbitrary triples (subject, predicate, object) without any restriction (besides limiting the result to 10 triples)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from helpers import sparql"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"You see a generic query to list triples, click on the *Play* button (a triangle on the SPARQL query's right).\n",
"\n",
"SELECT ?s ?p ?o\n",
"WHERE {\n",
" ?s ?p ?o\n",
"}\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, it worked, but the results are not particularly interesting. \n",
"Let's search for a famous football player, Fernando Torres.\n",
"Scroll down to see the query results.\n",
"\n",
"To do so, we will search for entities whose English \"human-readable representation\" (i.e., label) matches \"Fernando Torres\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"![alt text](query-results.jpg \"query results\")\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?athlete rdfs:label \"Fernando Torres\"@en \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, we found the IRI of the node: `http://dbpedia.org/resource/Fernando_Torres`\n",
"\n",
"Now we can start asking for more properties.\n",
"\n",
"To do so, go to http://dbpedia.org/resource/Fernando_Torres, and you will see all the information available about Fernando Torres. Pay attention to the names of predicates to be able to create new queries. For example, we are interesting in knowing where Fernando Torres was born (`dbo:birthPlace`).\n",
"\n",
"Let's go!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?athlete rdfs:label \"Fernando Torres\"@en ;\n",
" dbo:birthPlace ?birthPlace . \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we examine the SPARQL query, we find three blocks:\n",
"\n",
"* **PREFIX** section: IRIs of vocabularies and the prefix used below, to avoid long IRIs. e.g., by defining the `dbo` prefix in our example, the `dbo:birthPlace` below expands to `http://dbpedia.org/ontology/birthPlace`.\n",
"* **SELECT** section: variables we want to return (`*` is an abbreviation that selects all of the variables in a query)\n",
"* **WHERE** clause: triples where some elements are variables. These variables are bound during the query processing process and bounded variables are returned.\n",
"\n",
"Now take a closer look at the **WHERE** section.\n",
"We said earlier that triples are made out of three elements and each triple pattern should finish with a period (`.`) (although the last pattern can omit this).\n",
"However, when two or more triple patterns share the same subject, we omit it all but the first one, and use ` ;` as separator.\n",
"If both the subject and predicate are the same, we could use a coma `,` instead.\n",
"This allows us to avoid repetition and make queries more readable.\n",
"But don't forget the space before your separators (`;` and `.`).\n",
"\n",
"The result is interesting, we know he was born in Fuenlabrada, but we see an additional (wrong) value, the Spanish national football team. The conversion process from Wikipedia to DBPedia should still be tuned :).\n",
"\n",
"We can *fix* it by adding some more constraints.\n",
"In our case, only want a birthplace that is also a municipality (i.e., its type is `http://dbpedia.org/resource/Municipalities_of_Spain`).\n",
"Let's see!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbr: <http://dbpedia.org/resource/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?athlete rdfs:label \"Fernando Torres\"@en ;\n",
" dbo:birthPlace ?birthPlace .\n",
" ?birthPlace dbo:type dbr:Municipalities_of_Spain \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great. Now it looks better.\n",
"Notice that we added a new prefix.\n",
"\n",
"Now, is Fuenlabrada a big city?\n",
"Let's find out.\n",
"\n",
"**Hint**: you can find more subject / object / predicate nodes related to [Fuenlabrada])http://dbpedia.org/resource/Fuenlabrada) in the RDF graph just as we did before.\n",
"That is how we found the `dbo:areaTotal` property."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbr: <http://dbpedia.org/resource/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" dbr:Fuenlabrada dbo:areaTotal ?area \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, it shows 39.41 km$^2$.\n",
"\n",
"Let's go back to our Fernando Torres.\n",
"What we are really interested in is the name of the city he was born in, not its IRI.\n",
"As we saw before, the human-readable name is provided by the `rdfs:label` property:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbp: <http://dbpedia.org/property/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?player rdfs:label \"Fernando Torres\"@en ;\n",
" dbo:birthPlace ?birthPlace .\n",
" ?birthPlace dbo:type dbr:Municipalities_of_Spain ;\n",
" rdfs:label ?placeName \n",
" \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, we are almost there. We see that we receive the city name in many languages. We want just the English name. Let's filter!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbp: <http://dbpedia.org/property/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?player rdfs:label \"Fernando Torres\"@en ;\n",
" dbo:birthPlace ?birthPlace .\n",
" ?birthPlace dbo:type dbr:Municipalities_of_Spain ;\n",
" rdfs:label ?placeName .\n",
" FILTER ( LANG ( ?placeName ) = 'en' )\n",
" \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Awesome!\n",
"\n",
"But we said we don't care about the IRI of the place. We only want two pieces of data: Fernando's birth date and the name of his birthplace.\n",
"\n",
"Let's tune our query a bit more."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbp: <http://dbpedia.org/property/>\n",
"\n",
"SELECT ?birthDate, ?placeName\n",
"WHERE\n",
" {\n",
" ?player rdfs:label \"Fernando Torres\"@en ;\n",
" dbo:birthDate ?birthDate ;\n",
" dbo:birthPlace ?birthPlace .\n",
" ?birthPlace dbo:type dbr:Municipalities_of_Spain ;\n",
" rdfs:label ?placeName .\n",
" FILTER ( LANG ( ?placeName ) = 'en' )\n",
" \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great 😃\n",
"\n",
"Are there many football players born in Fuenlabrada? Let's find out!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sparql https://dbpedia.org/sparql\n",
"\n",
"PREFIX dbo: <http://dbpedia.org/ontology/>\n",
"PREFIX dbp: <http://dbpedia.org/property/>\n",
"\n",
"SELECT *\n",
"WHERE\n",
" {\n",
" ?player a dbo:SoccerPlayer ; \n",
" dbo:birthPlace dbr:Fuenlabrada . \n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, not that many. Observe we have used `a`.\n",
"It is just an abbreviation for `rdf:type`, both can be used interchangeably.\n",
"\n",
"If you want additional examples, you can follow the notebook by [Shawn Graham](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb), which is based on the SPARQL tutorial by Matthew Lincoln, available [here in English](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL) and [here in Spanish](https://programminghistorian.org/es/lecciones/retirada/sparql-datos-abiertos-enlazados]). You have also a local copy of these tutorials together with this notebook [here in English](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/graph-databases-and-SPARQL.html) and [here in Spanish](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/sparql-datos-abiertos-enlazados.html). \n"
"Congratulations! You have a SPARQL server running, serving a dataset!!"
]
},
{
@@ -412,15 +92,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"* <a id=\"1\">[1]</a> [SPARQL by Example. A Tutorial. Lee Feigenbaum. W3C, 2009](https://www.w3.org/2009/Talks/0615-qbe/#q1)\n",
"* <a id=\"2\">[2]</a> [RDF Primer W3C](https://www.w3.org/TR/rdf11-primer/)\n",
"* <a id=\"3\">[3]</a> [SPARQL queries of Beatles recording sessions](http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html)\n",
"* <a id=\"4\">[4]</a> [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).\n",
"* <a id=\"5\">[5]</a> [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)\n",
"* <a id=\"6\">[6]</a> [RDF Graph Data Model. Learn about the RDF graph model used by Stardog.](https://www.stardog.com/tutorials/data-model)\n",
"* <a id=\"7\">[7]</a> [Learn SPARQL Write Knowledge Graph queries using SPARQL with step-by-step examples.](https://www.stardog.com/tutorials/sparql/)\n",
"* <a id=\"8\">[8]</a> [Running Basic SPARQL Queries Against DBpedia.](https://medium.com/virtuoso-blog/dbpedia-basic-queries-bc1ac172cc09)\n",
"* <a id=\"8\">[9]</a> [Intro SPARQL based on painters.](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb)."
"* <a id=\"1\">[1]</a> [SPARQL by Example. A Tutorial. Lee Feigenbaum. W3C, 2009](https://hub.docker.com/r/stain/jena-fuseki)\n",
"* <a id=\"2\">[2]</a> [SPARQL queries of Beatles recording sessions](http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html)"
]
},
{
@@ -428,7 +101,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
@@ -459,7 +132,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
"version": "3.12.2"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
@@ -480,5 +153,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}