![](files/images/EscUpmPolit_p.gif "UPM")

<header style="width:100%;position:relative">
  <div style="width:80%;float:right;">
    <h1>Course Notes for Learning Intelligent Systems</h1>
    <h3>Department of Telematic Engineering Systems</h3>
    <h5>Universidad Politécnica de Madrid. © Carlos A. Iglesias </h5>
  </div>
        <img style="width:15%;" src="../logo.jpg" alt="UPM" />
</header>

## Introduction

This lecture provides an introduction to RDF and the query language SPARQL.

This is the first in a series of notebooks about SPARQL, which consists of:

* This notebook, which basic concepts of RDF and SPARQL
* [A notebook](01_SPARQL_Introduction.ipynb) that provides an introduction of SPARQL through a collection of progressively more difficult exercises]
* [A notebook](02_SPARQL_Custom_Endpoint.ipynb) with queries to a custom dataset, which links to the RDF exercises and it is out of the scope of this course. You can consult it if you are interested.

# RDF basics
This section is taken from [[1](#1), [2](#2)].

RDF allows us to make statements about resources. The format of these statements is simple. A statement always has the following structure:

      <subject> <predicate> <object>
    
An RDF statement expresses a relationship between two resources. The **subject** and the **object** represent the two resources being related; the **predicate** represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a **property**. Because RDF statements consist of three elements they are called **triples**.

Here are examples of RDF triples (informally expressed in pseudocode):

      <Bob> <is a> <person>.
      <Bob> <is a friend of> <Alice>.
      
Resources are identified by IRIs, which can appear in all three positions of a triple. For example, the IRI for Leonardo da Vinci in DBpedia is:

      <http://dbpedia.org/resource/Leonardo_da_Vinci>

IRIs can be abbreviated as *prefixed names*. For example, 
     PREFIX dbr: <http://dbpedia.org/resource/>
     <dbr:Leonardo_da_Vinci>
     
Objects can be literals: 
 * strings (e.g., "plain string" or "string with language"@en)
 * numbers (e.g., "13.4"^^xsd:float)
 * dates (e.g., )
 * booleans
 * etc.
 
RDF data is stored in RDF repositories that expose SPARQL endpoints. Let's query one of the most famous RDF repositories: dbpedia. First, we should learn how to execute SPARQL in a notebook.

# Executing SPARQL in a notebook
There are several ways to execute SPARQL in a notebook. The most popular ones are:
* using a SPARQL kernel [sparql kernel](https://github.com/paulovn/sparql-kernel) instead of the Python3 kernel
* using the [graph notebook package](https://pypi.org/project/graph-notebook/)
* using libraries such as [sparql-client](https://pypi.org/project/sparql-client/) or [rdflib](https://rdflib.dev/sparqlwrapper/) that enable executing SPARQL within a Python3 kernel
* using other libraries. In our case, a light library has been developed (the file helpers.py) for accessing SPARQL endpoints using an HTTP connection.

We are going to use the last option to avoid installing new packages.

For using the library, you need:
1. Import sparql from helpers (the file helpers.py available in the github repository)
2. Use the magic command '%%sparql' to indicate the SPARQL endpoint and then the SPARQL code.

Let's try it!

# Queries agains DBPedia

We are going to execute an SPARQL query agains DBPedia. This section is based on [[8](#8)].

First, we just create a query to retrieve arbitrary triples (subject, predicate, object) without any restriction (only that we want to limit to 10 results).

In [44]:
from helpers import sparql

In [45]:
%%sparql https://live.dbpedia.org/sparql

SELECT ?s ?p ?o
WHERE {
    ?s ?p ?o
}
LIMIT 10

s,p,o
,,
http://www.openlinksw.com/virtrdf-data-formats#default-iid,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-dt,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat


Well, it worked, but the results are not particulary interesting. 
Let's search for a famous football player, Fernando Torres.

In [46]:
%%sparql https://live.dbpedia.org/sparql

SELECT *
WHERE
     {
        ?athlete rdfs:label "Fernando Torres"@en 
     }

athlete
http://dbpedia.org/resource/Fernando_Torres


Amazing. Go to http://dbpedia.org/resource/Fernando_Torres and you will see all the information available about Fernando Torres. Pay attention to the names of predicates to be able to create new queries. For example, we are interesting in knowing where Fernando Torres was born.

Let's go!

In [47]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT *
WHERE
     {
        ?athlete rdfs:label "Fernando Torres"@en ;
                 dbo:birthPlace ?birthPlace .       
     }

athlete,birthPlace
,
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Spain_national_football_team
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada


Observe the SPARQL query:
* PREFIX section URIs of vocabularies and the prefix used below, to avoid long IRIs
* SELECT section: variables we want to return (* is an abbreviation that selects all of the variables in a query)
* WHERE triple pattern: triples where some elements are variables. These variables are bound during the query processing process and bounded variables are returned.

Pay attention to the WHERE section. Since both triple patterns share the same subject, we omit it in the second one, and link both with " ;". Each triple pattern should finish with a " ." (the last pattern can omit this). Don't forget the space before ";" and ".".

The result is interesting, we know he was born in Fuenlabrada, but we see an additional (wrong) value, the Spanish national football team. The conversion process from Wikipedia to DBPedia should still be tuned :).

We can 'fix' it, by adding some information. We want only municipalities as a result. Let's see!


In [48]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT *
WHERE
     {
        ?athlete rdfs:label "Fernando Torres"@en ;
                 dbo:birthPlace ?birthPlace .
        ?birthPlace dbo:type dbr:Municipalities_of_Spain 
     }

athlete,birthPlace
,
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada


Great. Now it looks better.
Do you know of Fuenlabrada is a big city? Let's query!

Hint: search (as previously) the subject / object / predicate nodes in the RDF graph (http://dbpedia.org/resource/Fuenlabrada).


In [49]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT *
WHERE
     {
        dbr:Fuenlabrada dbo:areaTotal ?area 
     }

area
39410000.0


Well, it shows 39.1 km$^2$. Let's go back to know more about Fernando Torres. We would want to retrieve the name of the city where he was born instead of the IRI. Let's try!

In [51]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT *
WHERE
     {
        ?player rdfs:label "Fernando Torres"@en ;
                 dbo:birthPlace ?birthPlace .
        ?birthPlace dbo:type dbr:Municipalities_of_Spain ;
                    rdfs:label ?placeName        
                 
     }

player,birthPlace,placeName
,,
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,فوينلابرادا
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada


Well, we are almost there. We see that we receive the city name in many languages. We want just the English name. Let's filter!

In [53]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT *
WHERE
     {
        ?player rdfs:label "Fernando Torres"@en ;
                 dbo:birthPlace ?birthPlace .
        ?birthPlace dbo:type dbr:Municipalities_of_Spain ;
                    rdfs:label ?placeName .
         FILTER ( LANG ( ?placeName ) = 'en' )
                 
     }

player,birthPlace,placeName
,,
http://dbpedia.org/resource/Fernando_Torres,http://dbpedia.org/resource/Fuenlabrada,Fuenlabrada


Awesome. Let's tune a bit more. We only want two results: Fernando's birth date and birth place (name). Let's go!

In [54]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT ?birthDate, ?placeName
WHERE
     {
        ?player rdfs:label "Fernando Torres"@en ;
                 dbo:birthDate ?birthDate ;
                 dbo:birthPlace ?birthPlace .
        ?birthPlace dbo:type dbr:Municipalities_of_Spain ;
                    rdfs:label ?placeName .
         FILTER ( LANG ( ?placeName ) = 'en' )
                 
     }

birthDate,placeName
,
1984-03-20,Fuenlabrada


In [None]:
Great :). Are there many football players born in Fuenlabrada? Let's query!

In [56]:
%%sparql https://dbpedia.org/sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT *
WHERE
     {
        ?player a dbo:SoccerPlayer ;  
                  dbo:birthPlace dbr:Fuenlabrada .
                 
     }

player
"http://dbpedia.org/resource/Luismi_(footballer,_born_1979)"
http://dbpedia.org/resource/Óscar_Miñambres
http://dbpedia.org/resource/Tachi_(footballer)
http://dbpedia.org/resource/Fernando_Torres


Well, not that many. Observe we have used 'a' (it is an abbreviation for rdf:type, both can be used).

If you want additional examples, you can follow the notebook by [Shawn Graham](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb), which is based on the  SPARQL tutorial by Matthew Lincoln, available [here in English](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL) and [here in Spanish](https://programminghistorian.org/es/lecciones/retirada/sparql-datos-abiertos-enlazados]). You have also a local copy of these tutorials together with this notebook [here in English](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/graph-databases-and-SPARQL.html) and [here in Spanish](https://htmlpreview.github.io/?https://github.com/gsi-upm/sitc/blob/master/lod/tutorial/sparql-datos-abiertos-enlazados.html). 


## References

* <a id="1">[1]</a> [SPARQL by Example. A Tutorial. Lee Feigenbaum. W3C, 2009](https://www.w3.org/2009/Talks/0615-qbe/#q1)
* <a id="2">[2]</a> [RDF Primer W3C](https://www.w3.org/TR/rdf11-primer/)
* <a id="3">[3]</a> [SPARQL queries of Beatles recording sessions](http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html)
* <a id="4">[4]</a> [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).
* <a id="5">[5]</a> [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)
* <a id="6">[6]</a> [RDF Graph Data Model. Learn about the RDF graph model used by Stardog.](https://www.stardog.com/tutorials/data-model)
* <a id="7">[7]</a> [Learn SPARQL Write Knowledge Graph queries using SPARQL with step-by-step examples.](https://www.stardog.com/tutorials/sparql/)
* <a id="8">[8]</a> [Running Basic SPARQL Queries Against DBpedia.](https://medium.com/virtuoso-blog/dbpedia-basic-queries-bc1ac172cc09)
* <a id="8">[9]</a> [Intro SPARQL based on painters.](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb).

## Licence
The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  

©  Carlos A. Iglesias, Universidad Politécnica de Madrid.