<header style="width:100%;position:relative">
  <div style="width:80%;float:right;">
    <h1>Course Notes for Learning Intelligent Systems</h1>
    <h3>Department of Telematic Engineering Systems</h3>
    <h5>Universidad Politécnica de Madrid</h5>
  </div>
        <img style="width:15%;" src="../logo.jpg" alt="UPM" />
</header>

## Advanced SPARQL

This notebook complements [the SPARQL notebook](./01_SPARQL.ipynb) with some advanced commands.

If you have not completed the exercises in the previous notebook, please do so before continuing.


## Objectives

* To cover some SPARQL concepts that are less frequently used 

## Tools

See [the SPARQL notebook](./01_SPARQL_Introduction.ipynb#Tools)

Run this line to enable the `%%sparql` magic command.

In [None]:
from helpers import *

## Exercises

### Working with dates

To explore dates, we will focus on our Writers example.

First, search for writers born in the XX century.
You can use a special filter, knowing that `"2000"^^xsd:date` is the first date of year 2000.

In [None]:
%%sparql https://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?escritor ?nombre (year(?fechaNac) as ?nac)
WHERE {
    ?escritor dct:subject dbc:Spanish_novelists ;
              rdfs:label ?nombre ;
              dbo:birthDate ?fechaNac .
    FILTER(lang(?nombre) = "es") .
    # YOUR ANSWER HERE
}
# YOUR ANSWER HERE
LIMIT 1000

In [None]:
assert 'Ramiro Ledesma' in solution()['columns']['nombre']
assert 'Ray Loriga' in solution()['columns']['nombre']
assert all(int(x) > 1899 and int(x) < 2001 for x in solution()['columns']['nac'])

Now, get the list of Spanish novelists that are still alive.

A person is alive if their death date is not defined and the were born less than 100 years ago.

Remember, we can check whether the optional value for a key was bound in a SPARQL query using `BOUND(?key)`.

In [None]:
%%sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac

WHERE {
    ?escritor dct:subject dbc:Spanish_novelists .
    ?escritor rdfs:label ?nombre .
    ?escritor dbo:birthDate ?fechaNac .
# YOUR ANSWER HERE
    FILTER(lang(?nombre) = "es") .
}
# YOUR ANSWER HERE
LIMIT 1000

In [None]:
assert 'Fernando Arrabal' in solution()['columns']['nombre']
assert 'Albert Espinosa' in solution()['columns']['nombre']
for year in solution()['columns']['nac']:
    assert int(year) >= 1918

## Working with badly formatted dates (OPTIONAL!)

Now, get the list of Spanish novelists that died before their fifties (i.e. younger than 50 years old), or that aren't 50 years old yet.

For the sake of simplicity, you can use the `year(<date>)` function.

Hint: you can use boolean logic in your filters (e.g. `&&` and `||`).

Hint 2: Some dates are not formatted properly, which makes some queries fail when they shouldn't. As a workaround, you could convert the date to string, and back to date again: `xsd:dateTime(str(?date))`.

In [None]:
%%sparql https://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbo:<http://dbpedia.org/ontology/>

SELECT ?escritor, ?nombre, year(?fechaNac) as ?nac, ?fechaDef

WHERE {
    ?escritor dct:subject dbc:Spanish_novelists .
    ?escritor rdfs:label ?nombre .
    ?escritor dbo:birthDate ?fechaNac .
    # YOUR ANSWER HERE
}
# YOUR ANSWER HERE
LIMIT 100

In [None]:
assert 'Javier Sierra' in solution()['columns']['nombre']
assert 'http://dbpedia.org/resource/José_Ángel_Mañas' in solution()['columns']['escritor']

### Regular expressions

[Regular expressions](https://www.w3.org/TR/rdf-sparql-query/#funcex-regex) are a very powerful tool, but we will only cover the basics in this exercise.

In essence, regular expressions match strings against patterns.
In their simplest form, they can be used to find substrings within a variable.
For instance, using `regex(?label, "substring")` would only match if and only if the `?label` variable contains `substring`.
But regular expressions can be more complex than that.
For instance, we can find patterns such as: a 10 digit number, a 5 character long string, or variables without whitespaces.

The syntax of the regex function is the following:

```
regex(?variable, "pattern", "flags")
```

Flags are optional configuration options for the regular expression, such as *do not care about case* (`i` flag).

As an example, let us find the cities in Madrid that contain "de" in their name.

In [None]:
%%sparql https://dbpedia.org/sparql

SELECT ?localidad
WHERE {
    ?localidad <http://dbpedia.org/ontology/isPartOf> <http://dbpedia.org/resource/Community_of_Madrid> .
    ?localidad rdfs:label ?nombre .
    FILTER (lang(?nombre) = "es" ).
    FILTER regex(?nombre, "de", "i")
}
LIMIT 10

Now, use regular expressions to find Spanish novelists whose **first name** is Juan.
In other words, their name **starts with** "Juan".

In [None]:
%%sparql https://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

# YOUR ANSWER HERE

WHERE {
    {
        ?escritor dct:subject dbc:Spanish_poets .
    }
    UNION {
        ?escritor dct:subject dbc:Spanish_novelists .
    }
    ?escritor rdfs:label ?nombre
    FILTER(lang(?nombre) = "es") .
# YOUR ANSWER HERE
}
ORDER BY ?nombre
LIMIT 1000

In [None]:
assert len(solution()['columns']['nombre']) > 15
for i in solution()['columns']['nombre']:
    assert 'Juan' in i
assert "Robert Juan-Cantavella" not in solution()['columns']['nombre']

## Group concat

Sometimes, it is useful to aggregate results from form different rows.
For instance, we might want to get a comma-separated list of the names in each each autonomous community in Spain.

In those cases, we can use the `GROUP_CONCAT` function.

In [None]:
%%sparql https://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
        
SELECT ?com, GROUP_CONCAT(?name, ",") as ?places  # notice how we rename the variable

WHERE {
    ?com dct:subject dbc:Autonomous_communities_of_Spain .
    ?localidad dbo:subdivision ?com ;
             rdfs:label ?name .
    FILTER (lang(?name)="es")
}

ORDER BY ?com
LIMIT 100

Try it yourself, to get a list of works by each of the authors in this query:

In [None]:
%%sparql https://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX dbc:<http://dbpedia.org/resource/Category:>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>

# YOUR ANSWER HERE

WHERE {
    ?escritor a dbo:Writer .
    ?escritor rdfs:label ?nombre .
    ?escritor dbo:birthDate ?fechaNac .
    ?escritor dbo:birthPlace dbr:Madrid .
    # YOUR ANSWER HERE
    FILTER(lang(?nombre) = "es") .
    FILTER(!bound(?titulo) || lang(?titulo) = "en") .

}
ORDER BY ?nombre
LIMIT 100

## References

## Licence
The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  

© 2018 Universidad Politécnica de Madrid.