<header style="width:100%;position:relative">
  <div style="width:80%;float:right;">
    <h1>Course Notes for Learning Intelligent Systems</h1>
    <h3>Department of Telematic Engineering Systems</h3>
    <h5>Universidad Politécnica de Madrid</h5>
  </div>
        <img style="width:15%;" src="../logo.jpg" alt="UPM" />
</header>

## Introduction to Linked Data

This lecture provides a quick introduction to semantic queries in Python.
We will be using DBpedia, a semantic version of Wikipedia.

The language we will use to query DBpedia is SPARQL, a semantic query language inspired by SQL.
For convenience, the examples in the notebook are executable, and they are accompanied by some code to test the results.
If the tests pass, you probably got the answer right.

However, you can also use any other method to write and send your queries.
You may find online query editors particularly useful.
In addition to running queries from your browser, they provide useful features such as syntax highlighting and autocompletion.
Some examples are:

* DBpedia's virtuoso query editor https://dbpedia.org/sparql
* A javascript based client hosted at GSI: http://yasgui.cluster.gsi.dit.upm.es/

## Objectives

* Learning SPARQL and the Linked Data principles by defining queries to answer a set of problems of increasing difficulty
* Verifying the usefulness of the Linked Open Data initiative by querying data from different RDF graphs and endpoints
* Learning how to use integrated SPARQL editors and programming interfaces to SPARQL.

## Tools

* This notebook
* SPARQL editors (optional)
    * YASGUI-GSI http://yasgui.cluster.gsi.dit.upm.es
    * DBpedia virtuoso http://dbpedia.org/sparql

Using the YASGUI-GSI editor has several advantages over other options.
It features:

* Selection of data source, either by specifying the URL or by selecting from a dropdown menu
* Interactive query editing
    * A set of pre-defined queries
    * Syntax errors
    * Auto-complete
* Data visualization
    * Total number of results
    * Different formats (table, pivot table, raw response, etc.)
    * Pagination of results
    * Search and filter results

Run this line to enable the `%%sparql` magic command.

In [None]:
from helpers import *

The `%%sparql` magic command will allow us to use SPARQL inside normal jupyter cells.

For instance, the following code:

```
%%sparql

MY QUERY
```    

Is the same as `run_query('MY QUERY', endpoint='http://dbpedia.org/sparql')` plus some additional steps, such as saving the results in a nice table format so that they can be used later and storing the results in a variable (`LAST_QUERY`), which we will use in our tests.

You do not need to worry about it, and **you can always use one of the suggested online editors if you wish**.

## Exercises

The following exercises cover the basics of SPARQL with simple use cases.
We will provide you some example code to get you started, the *question* you will have to answer using SPARQL, and the skeleton for the answer.

After every query, you will find some python code to test the results of the query.
Make sure you've run the tests before moving to the next exercise.
If the test gives you an error, you've probably done something wrong.
You **do not need to understand or modify the test code**.


In case you're interested, the tests rely on the `LAST_QUERY` variable, which is updated by the `%%sparql` magic after every query.
This variable contains the full query used (`LAST_QUERY["query"]`), the endpoint it was sent to (`LAST_QUERY["endpoint"]`), and a dictionary with the response of the endpoint (`LAST_QUERY["results"]`).
For convenience, the results are also given as tuples (`LAST_QUERY["tuples"]`), and as a dictionary of of `{column:[values]}` (`LAST_QUERY["columns"]`).

#### First Select



Let's start with a simple query. We will get a list of cities and towns in Madrid.
If we take a look at the DBpedia ontology or the page of any town we already know, we discover that the property that links towns to their community is [`isPartOf`](http://dbpedia.org/ontology/isPartOf), and [the Community of Madrid is also a resource in DBpedia](http://dbpedia.org/resource/Community_of_Madrid)

Since there are potentially many cities to get, we will limit our results to the first 10 results:

In [None]:
%%sparql

SELECT ?localidad
WHERE {
    ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid>
}
LIMIT 10

However, that query is very verbose because we are using full URIs.
To simplify it, we will make use of SPARQL prefixes:

In [None]:
%%sparql

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
        
SELECT ?localidad
WHERE {
    ?localidad dbo:isPartOf dbr:Community_of_Madrid.
}
LIMIT 10

To make sure that the query returned something sensible, we can test it with some python code:

In [None]:
assert 'localidad' in LAST_QUERY['columns']
assert len(LAST_QUERY['tuples']) == 10

Now that you have some experience under your belt, it is time to design your own query.

Your first task it to get a list of Spanish Novelits, using the skeleton below and the previous query to guide you.

Pages for Spanish novelists are grouped in the *Spanish novelists* DBpedia category. You can use that fact to get your list.
In other words, the difference from the previous query will be using `dct:subject` instead of `dbo:isPartOf`, and `dbc:Spanish_novelists` instead of `dbr:Community_of_Madrid`.

In [None]:
%%sparql

PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>

SELECT ?escritor

WHERE {
# YOUR CODE HERE
}
LIMIT 10

In [None]:
assert len(LAST_QUERY['columns']) == 1 # We only use one variable, ?escritor
assert len(LAST_QUERY['tuples']) == 10 # There should be 10 results

### Using more criteria

We can get more than one property in the same query. Let us modify our query to get the population of the cities as well.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
        
SELECT ?localidad ?pop ?when

WHERE {
    ?localidad dbo:populationTotal ?pop .
    ?localidad dbo:isPartOf dbr:Community_of_Madrid.
    ?localidad dbp:populationAsOf ?when .
}

LIMIT 100

In [None]:
assert 'localidad' in LAST_QUERY['columns']
assert 'http://dbpedia.org/resource/Parla' in LAST_QUERY['columns']['localidad']
assert ('http://dbpedia.org/resource/San_Sebastián_de_los_Reyes', '75912', '2009') in LAST_QUERY['tuples']

Time to try it yourself.

Get the list of Spanish novelists AND their name (using rdfs:label).

In [None]:
%%sparql

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>

SELECT ?escritor ?name

WHERE {
# YOUR CODE HERE
}
LIMIT 10

In [None]:
assert 'escritor' in LAST_QUERY['columns']
assert 'http://dbpedia.org/resource/Eduardo_Mendoza_Garriga' in LAST_QUERY['columns']['escritor']
assert ('http://dbpedia.org/resource/Eduardo_Mendoza_Garriga', 'Eduardo Mendoza') in LAST_QUERY['tuples']

### Filtering and ordering

In the previous example, we saw that we got what seemed to be duplicated answers.

This happens because entities can have labels in different languages (e.g. English, Spanish).
To restrict the search to only those results we're interested in, we can use filtering.

We can also decide the order in which our results are shown.

For instance, this is how we could use filtering to get only large cities in our example, ordered by population:

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
        
SELECT ?localidad ?pop ?when

WHERE {
    ?localidad dbo:populationTotal ?pop .
    ?localidad dbo:isPartOf dbr:Community_of_Madrid.
    ?localidad dbp:populationAsOf ?when .
    FILTER(?pop > 100000)
}
ORDER BY ?pop
LIMIT 100

Note that ordering happens before limits.

In [None]:
# We still have the biggest city
assert ('http://dbpedia.org/resource/Madrid', '3141991', '2014') in LAST_QUERY['tuples']
# But the smaller ones are gone
assert 'http://dbpedia.org/resource/Tres_Cantos' not in LAST_QUERY['columns']['localidad']
assert 'http://dbpedia.org/resource/San_Sebastián_de_los_Reyes' not in LAST_QUERY['columns']['localidad']

Now, try filtering to get a list of novelists and their name in Spanish, ordered by name `(FILTER (LANG(?nombre) = "es") y ORDER BY`

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>

SELECT ?escritor, ?nombre

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 1000

In [None]:
assert len(LAST_QUERY['tuples']) >= 50
assert 'Adelaida García Morales' in LAST_QUERY['columns']['nombre']
assert sum(1 for k in LAST_QUERY['columns']['escritor'] if k == 'http://dbpedia.org/resource/Adelaida_García_Morales') == 1

### Dates

From now on, we will focus on our Writers example.

First, search for writers born in the XX century.
You can use a special filter, knowing that `"2000"^^xsd:date` is the first date of year 2000.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 1000

In [None]:
assert 'Camilo José Cela' in LAST_QUERY['columns']['nombre']
assert 'Javier Marías' in LAST_QUERY['columns']['nombre']
assert all(int(x) > 1899 and int(x) < 2001 for x in LAST_QUERY['columns']['nac'])

### Optional

In our last example, we were missing all the novelists that are missing their birth information in DBpedia.

We can specify optional values in a query using the `OPTIONAL` keyword.
When a set of clauses are inside an OPTIONAL group, the SPARQL endpoint will try to use them in the query
If there are no results for that part of the query, the variables it specifies will not be bound (i.e. they will be empty).

Using that, let us retrieve all the novelists born between 1900 and 2000, and the date they died (if they are available).

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, ?fechaNac, ?fechaDef

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 100

In [None]:
assert 'Camilo José Cela' in LAST_QUERY['columns']['nombre']
assert '1916-05-11' in LAST_QUERY['columns']['fechaNac']
assert '' not in LAST_QUERY['columns']['fechaNac'] # All birthdates are defined
assert '' in LAST_QUERY['columns']['fechaDef'] # Some deathdates are not defined

### Bound

We can check whether the optional value for a key was bound in a SPARQL query using `BOUND(?key)`.

This is very useful for two purposes.
First, it allows us to look for patterns that **do not occur** in the graph, such as missing properties.
For instance, we could search for the authors with missing birth information so we can add it.
Secondly, we can use bound in filters to get conditional filters.
We will explore both uses in this exercise.

Get the list of Spanish novelists that are still alive.
A person is alive if their death date is not defined and the were born less than 100 years ago

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 1000

In [None]:
assert 'Fernando Arrabal' in LAST_QUERY['columns']['nombre']
assert 'Albert Espinosa' in LAST_QUERY['columns']['nombre']
for year in LAST_QUERY['columns']['nac']:
    assert int(year) >= 1918

Now, get the list of Spanish novelists that died before their fifties (i.e. younger than 50 years old), or that aren't 50 years old yet.

Hint: you can use boolean logic in your filters (e.g. `&&` and `||`).

Hint 2: Some dates are not formatted properly, which makes some queries fail when they shouldn't. You might need to convert between different types as a workaround. For instance, you could get the year from a date like this: `year(xsd:dateTime(str(?date)))`.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac, ?fechaDef

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 100

In [None]:
assert 'Javier Sierra' in LAST_QUERY['columns']['nombre']
assert 'http://dbpedia.org/resource/Sanmao_(author)' in LAST_QUERY['columns']['escritor']

### Finding unique elements

In our last example, our results show some authors more than once.
This is because some properties are defined more than once.
For instance, birth date is giving using different formats.
Even if we exclude that property from our results by not adding it in our `SELECT`, we will get duplicated lines.

To solve this, we can use the `DISTINCT` keyword.

Modify your last query to remove duplicated lines.
In other words, authors should only appear once.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT DISTINCT ?escritor, ?nombre, year(?fechaNac) as ?nac

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 100

In [None]:
assert 'Javier Sierra' in LAST_QUERY['columns']['nombre']
assert 'http://dbpedia.org/resource/Albert_Espinosa' in LAST_QUERY['columns']['escritor']

from collections import Counter
c = Counter(LAST_QUERY['columns']['nombre'])
for count in c.values():
    assert count == 1
    
c1 = Counter(LAST_QUERY['columns']['escritor'])
assert all(count==1 for count in c1.values())
# c = Counter(LAST_QUERY['columns']['nombre'])

### Using other resources

Get the list of living Spanish novelists born in Madrid.

Hint: use `dbr:Madrid` and `dbo:birthPlace`

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT DISTINCT ?escritor, ?nombre, ?lugarNac, year(?fechaNac) as ?nac

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 100

In [None]:
assert 'José Ángel Mañas' in LAST_QUERY['columns']['nombre']
assert 'http://dbpedia.org/resource/Madrid' in LAST_QUERY['columns']['lugarNac']
MADRID_QUERY = LAST_QUERY['columns'].copy()

### Traversing the graph

Get the list of works of the authors in the previous query (i.e. authors born in Madrid), if they have any.

Hint: use `dbo:author`, which is a **property of a literary work** that points to the author.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT DISTINCT ?escritor, ?nombre, ?obra

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 10000

In [None]:
assert 'http://dbpedia.org/resource/A_Heart_So_White' in LAST_QUERY['columns']['obra']
assert 'http://dbpedia.org/resource/Tomorrow_in_the_Battle_Think_on_Me' in LAST_QUERY['columns']['obra']
assert '' in LAST_QUERY['columns']['obra'] # Some authors don't have works in dbpedia

We can also get a list of the works in string format using GROUP_CONCAT.
For instance, `GROUP_CONCAT(?obra, ",")`, to separate works with a comma.

Try it yourself:

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

# YOUR CODE HERE

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 10000

### Traversing the graph

Get a list of living Spanish novelists born in Madrid, their name in Spanish, a link to their foto and a website (if they have one).

If the query is right, you should see a list of writers after running the test code.

Hint: `foaf:depiction` and `foaf: homepage`

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor ?web ?foto

WHERE {
# YOUR CODE HERE
}
ORDER BY ?nombre
LIMIT 100

In [None]:
fotos = set(filter(lambda x: x != '', LAST_QUERY['columns']['foto']))
assert len(fotos) > 2
show_photos(fotos) #show the pictures of the writers!

### Union

We can merge the results of several queries, just like using `JOIN` in SQL.
The keyword in SPARQL is `UNION`, because we are merging graphs.

`UNION` is useful in many situations.
For instance, when there are equivalent properties, or when you want to use two search terms and FILTER would be too inefficient.

The syntax is as follows:

```sparql
SELECT ?title
WHERE  {
  { ?book dc10:title  ?title }
  UNION
  { ?book dc11:title  ?title }
  
  ... REST OF YOUR QUERY ...

}
```



Using UNION, get a list of distinct spanish novelists AND poets.

Hint: Category: Spanish_poets

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT DISTINCT ?escritor, ?nombre

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 10000

In [None]:
assert 'Garcilaso de la Vega' in LAST_QUERY['columns']['nombre']

You can also get the count of results either by inspecting the result (we will not cover this) or by aggregating the results using the `COUNT` operation.

The syntax is:
    
```sparql
SELECT COUNT(?variable) as ?count_name
```

Try it yourself with our previous example:

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

# YOUR CODE HERE

WHERE {
# YOUR CODE HERE
}
LIMIT 10000

In [None]:
assert len(LAST_QUERY['columns']) == 1
column_name = list(LAST_QUERY['columns'].keys())[0]
assert int(LAST_QUERY['columns'][column_name][0]) > 200

### Regular expressions

The last SPARQL concept we will cover are [regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) (`regex`).
Regular expressions are a very powerful tool, but we will only cover the basics in this exercise.

In essence, regular expressions match strings against patterns.
In their simplest form, they can be used to find substrings within a variable.
For instance, using `regex(?label, "substring")` would only match if and only if the `?label` variable contains `substring`.
But regular expressions can be more complex than that.
For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.

The syntax of the regex function is the following:

```
regex(?variable, "pattern", "flags")
```

Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).

As an example, let us find the cities in Madrid that contain "de" in their name.

In [None]:
%%sparql

SELECT ?localidad
WHERE {
    ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid> .
    ?localidad rdfs:label ?nombre .
    FILTER (lang(?nombre) = "es" ).
    FILTER regex(?nombre, "de", "i")
}
LIMIT 10

Now, use regular expressions to find Spanish novelists whose **first name** is Juan.
In other words, their name **starts with** "Juan".

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

# YOUR CODE HERE

WHERE {
# YOUR CODE HERE
}
# YOUR CODE HERE
LIMIT 1000

In [None]:
assert len(LAST_QUERY['columns']['nombre']) > 15
for i in LAST_QUERY['columns']['nombre']:
    assert 'Juan' in i
assert "Robert Juan-Cantavella" not in LAST_QUERY['columns']['nombre']

## Additional exercises

Find out if there are more dbpedia entries for writers (dbo:Writer) than for football players (dbo:SoccerPlayers)

Get a list of European countries with a population higher than 20 million, in decreasing order of population, including their URI, name in English and population.

Find the country in the world that speaks the most languages. Show its name in Spanish, if available.

## Querying custom data

In the last part of this course, we will query the data annotated in the previous course on RDF.

The goal is to try SPARQL with data annotated by users with limited knowledge of vocabularies and semantics, and to compare the experience with similar queries to a more structured dataset.

Hence, there are two parts.
First, you will query a set of graphs annotated by students of this course.
Then, you will query a synthetic dataset that contains similar information.

In particular, you need to run five queries, each one will answer one of the following questions:

* Number of hotels (or entities) with reviews
* Number of reviews
* The hotel with the lowest average score
* The hotel with the highest average score
* A list of hotels with their addresses and telephone numbers

### Manually annotated

Querying the manually annotated dataset is slightly different from querying DBpedia.
The main difference is that this dataset uses different graphs to separate the annotations from different students.

**Each graph is a separate set of triples**.
For this exercise, you could think of graphs as individual endpoints.


First, let us get a list of graphs available:

In [None]:
%%sparql http://fuseki.cluster.gsi.dit.upm.es/ejerciciohoteles
    
SELECT ?g WHERE {
    GRAPH ?g {
    ?s ?p ?o .
    }
}

Once you have this list, you can query specific graphs like so:

In [None]:
%%sparql http://fuseki.cluster.gsi.dit.upm.es/ejerciciohoteles
    
SELECT *
WHERE {
    GRAPH <http://fuseki.cluster.gsi.dit.upm.es/36de86e6754934381d935f10618fe985>{
    ?s ?p ?o .
    }
}

Now, design five queries to answer the questions in the description, and run each of them in at least five of these graphs.

You can manually run the queries or use the code below, where you only need to specify your queries and the graphs you have identified.

If you need additional prefixes, feel free to modify the TEMPLATE variable.

In [None]:
from IPython.display import display

QUERIES = {
    'highest score': '''
    ?s ?p ?o
''',
    'lowest score': '''
        ?s ?p ?o
    ''',
    'number of hotels': '''
        ?s ?p ?o
    ''',
    'number of reviews': '''
        ?s ?p ?o
    ''',
    'telephones and addresses': '''
        ?s ?p ?o
    ''',
    
}

TEMPLATE = '''
SELECT * WHERE {{
    GRAPH <{graph}>{{
        {query}
        }}
    }}
'''

GRAPHS = ['http://fuseki.cluster.gsi.dit.upm.es/36de86e6754934381d935f10618fe985',
         ]

for name, query in QUERIES.items():
    for graph in GRAPHS:
        print(name, '@', graph)
        display(sparql('http://fuseki.cluster.gsi.dit.upm.es/ejerciciohoteles', TEMPLATE.format(graph=graph,
                                                                                               query=query)
                      ))

### Synthetic dataset

Now, run the same queries in the synthetic dataset.

The query below should get you started:

In [None]:
%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotelessintetico 

SELECT *
WHERE {
    ?s ?p ?o .
}
LIMIT 10

### Discussion

Compare the results of the synthetic and the manual dataset, and answer these questions:

Both datasets should use the same schema. Are there any differences when it comes to using them?

In [None]:
# YOUR CODE HERE

Are data correctly annotated in both datasets?

In [None]:
# YOUR CODE HERE

Has any of the datasets been harder to query? Why?

In [None]:
# YOUR CODE HERE

Has any of the datasets been harder to query? Why

In [None]:
# YOUR CODE HERE

Are data correctly annotated in both datasets

In [None]:
# YOUR CODE HERE

## References

* [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).
* [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)

## Licence
The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  

© 2018 Universidad Politécnica de Madrid.