mirror of
https://github.com/gsi-upm/sitc
synced 2025-01-08 04:01:27 +00:00
1920 lines
51 KiB
Plaintext
Executable File
1920 lines
51 KiB
Plaintext
Executable File
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "7276f055a8c504d3c80098c62ed41a4f",
|
|
"grade": false,
|
|
"grade_id": "cell-0bfe38f97f6ab2d2",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"<header style=\"width:100%;position:relative\">\n",
|
|
" <div style=\"width:80%;float:right;\">\n",
|
|
" <h1>Course Notes for Learning Intelligent Systems</h1>\n",
|
|
" <h3>Department of Telematic Engineering Systems</h3>\n",
|
|
" <h5>Universidad Politécnica de Madrid</h5>\n",
|
|
" </div>\n",
|
|
" <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
|
|
"</header>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Introduction to Linked Open Data\n",
|
|
"\n",
|
|
"This lecture provides a quick introduction to semantic queries in Python using SPARQL.\n",
|
|
"SPARQL is a semantic query language inspired by SQL.\n",
|
|
"\n",
|
|
"This is the first in a series of notebooks about SPARQL, which consists of:\n",
|
|
"\n",
|
|
"* This notebook, which introduces basic concepts using a small public dataset.\n",
|
|
"* [A notebook with queries to a custom dataset](02_SPARQL_Custom_Endpoint.ipynb), which links to the RDF exercises and it is out of the scope of this course. You can consult it if you are interested."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Objectives\n",
|
|
"\n",
|
|
"* Learning SPARQL and the Linked Data principles by defining queries to answer a set of problems of increasing difficulty\n",
|
|
"* Learning how to use integrated SPARQL editors and programming interfaces to SPARQL."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "40ccd05ad0704781327031a84dfb9939",
|
|
"grade": false,
|
|
"grade_id": "cell-4f8492996e74bf20",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"## Tools\n",
|
|
"\n",
|
|
"* This notebook\n",
|
|
"* External SPARQL editors (optional)\n",
|
|
" * YASGUI-GSI http://yasgui.gsi.upm.es\n",
|
|
" * DBpedia virtuoso http://dbpedia.org/sparql\n",
|
|
"\n",
|
|
"Using the YASGUI-GSI editor has several advantages over other options.\n",
|
|
"It features:\n",
|
|
"\n",
|
|
"* Selection of data source, either by specifying the URL or by selecting from a dropdown menu\n",
|
|
"* Interactive query editing\n",
|
|
" * A set of pre-defined queries\n",
|
|
" * Syntax errors\n",
|
|
" * Auto-complete\n",
|
|
"* Data visualization\n",
|
|
" * Total number of results\n",
|
|
" * Different formats (table, pivot table, raw response, etc.)\n",
|
|
" * Pagination of results\n",
|
|
" * Search and filter results"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "81894e9d65e5dd9f3b6e1c5f66804bf6",
|
|
"grade": false,
|
|
"grade_id": "cell-70ac24910356c3cf",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"## Instructions\n",
|
|
"\n",
|
|
"We will be using a semantic server, available at: http://fuseki.gsi.upm.es/sitc.\n",
|
|
"\n",
|
|
"This server contains a dataset about [Beatles songs](http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html), which we will query with SPARQL.\n",
|
|
"\n",
|
|
"We will provide you some example code to get you started, the *question* you will have to answer using SPARQL, a template for the answer.\n",
|
|
"\n",
|
|
"After every query, you will find some python code to test the results of the query.\n",
|
|
"**Make sure you've run the tests before moving to the next exercise**.\n",
|
|
"If the test gives you an error, you've probably done something wrong.\n",
|
|
"You do not need to understand or modify the test code."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "1d332d3d11fd6b57f0ec0ac3c358c6cb",
|
|
"grade": false,
|
|
"grade_id": "cell-eb13908482825e42",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"For convenience, the examples in the notebook are executable (using the `%%sparql` magic command), and they are accompanied by some code to test the results.\n",
|
|
"If the tests pass, you probably got the answer right.\n",
|
|
"\n",
|
|
"**Run this line to enable the `%%sparql` magic command.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "aca7c5538b8fc53e99c92e94e6818c83",
|
|
"grade": false,
|
|
"grade_id": "cell-b3f3d92fa2100c3d",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from helpers import sparql, solution, show_photos"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "e896b6560e45d5c385a43aa85e3523c7",
|
|
"grade": false,
|
|
"grade_id": "cell-04410e75828c388d",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"The `%%sparql` magic command will allow us to use SPARQL inside normal jupyter cells.\n",
|
|
"\n",
|
|
"For instance, the following code:\n",
|
|
"\n",
|
|
"```python \n",
|
|
"%%sparql http://dbpedia.org/sparql\n",
|
|
"\n",
|
|
"<MY QUERY>\n",
|
|
"``` \n",
|
|
"\n",
|
|
"Is the same as `run_query('<MY QUERY>', endpoint='http://dbpedia.org/sparql')` plus some additional steps, such as saving the results in a nice table format so that they can be used later and storing the results in a variable (`solution()`), which we will use in our tests.\n",
|
|
"\n",
|
|
"You do not need to worry about it, and **you can always use one of the suggested online editors if you wish**."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "34710d3bb8e2cf826833a43adb7fb448",
|
|
"grade": false,
|
|
"grade_id": "cell-2a44c0da2c206d01",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"You can also use any other method to write your queries.\n",
|
|
"Just make sure to copy the working query back into the notebook so you can test it.\n",
|
|
"\n",
|
|
"You may find online query editors particularly useful.\n",
|
|
"In addition to running queries from your browser, they provide useful features such as syntax highlighting and autocompletion.\n",
|
|
"Some examples are:\n",
|
|
"\n",
|
|
"* DBpedia's virtuoso query editor https://dbpedia.org/sparql\n",
|
|
"* A javascript based client hosted at GSI: http://yasgui.gsi.upm.es/\n",
|
|
"\n",
|
|
"[^1]: http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "79c60bd3d4c13f380aae5778c5ce7245",
|
|
"grade": false,
|
|
"grade_id": "cell-d645128d3af18117",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"## Exercises\n",
|
|
"\n",
|
|
"The following exercises cover the basics of SPARQL with simple use cases."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "f7428fe79cd33383dfd3b09a0d951b6e",
|
|
"grade": false,
|
|
"grade_id": "cell-8391a5322a9ad4a7",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"#### First select - Exploring the dataset\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "f6b5da583694dd5cc9326c670830875d",
|
|
"grade": false,
|
|
"grade_id": "cell-4f56a152e4d70c02",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"Let's start with a simple query to explore the dataset using SPARQL.\n",
|
|
"We will get a list of the types of entities in the dataset.\n",
|
|
"\n",
|
|
"SPARQL syntax is similar to SQL, mixed with turtle.\n",
|
|
"A SPARQL query has two main parts: the `SELECT` block, which specifies what variables we want to get; and the `WHERE` block which, loosely speaking, defines how the variables will be obtained from the graph.\n",
|
|
"\n",
|
|
"In order to construct the `WHERE` block, we have to know the data we want to extract would be represented in Turtle.\n",
|
|
"\n",
|
|
"In particular, to write an entity and its type, we would write this triple:\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"<my_entity> a <type> .\n",
|
|
"```\n",
|
|
"\n",
|
|
"For example:\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"example:Timmy a example:Boy\n",
|
|
"```\n",
|
|
"\n",
|
|
"In SPARQL, the parts that we wish to extract are replaced with a variable (e.g. `?name`, `?type`).\n",
|
|
"Hence, we would have something like this:\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"?entity a ?type\n",
|
|
"```\n",
|
|
"\n",
|
|
"The name of the variable has no effect on the query, but you should use a sensible name.\n",
|
|
"In these notebooks, try to use the names provided in the templates, because they might be used in the tests.\n",
|
|
"\n",
|
|
"There are additional parts in the query.\n",
|
|
"For now, we will only cover the `LIMIT` statement, which limits the number of results we will get.\n",
|
|
"Using `LIMIT` is usually a good idea, especially when trying new queries, because the dataset may be too big. \n",
|
|
"\n",
|
|
"Using all these concepts, we will run our first query, to get the list of entities and their type:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "3bc71f851a33fa401d18ea3ab02cf61f",
|
|
"grade": false,
|
|
"grade_id": "cell-8ce8c954513f17e7",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"SELECT ?entity ?type\n",
|
|
"WHERE {\n",
|
|
" ?entity a ?type\n",
|
|
"}\n",
|
|
"LIMIT 10"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "markdown",
|
|
"checksum": "d6a79c2f5fd005a9e15a8f67dcfd4784",
|
|
"grade": false,
|
|
"grade_id": "cell-3d6d622c717c3950",
|
|
"locked": true,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"source": [
|
|
"You can check that the results you got match our expectations:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert len(solution()['tuples']) == 10 # Make sure we got 10 results \n",
|
|
"assert len(solution()['columns']) >= 1 # In 2 columns (?entity and ?type)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, use the same concepts to write a query that gets the **list of entities (subjects) and their properties (predicates)**.\n",
|
|
"\n",
|
|
"**Hint**: review the previous query. In there, we fixed a property (`a`, i.e. `rdfs:type`) and used a variable for the objects. Now we are insterested properties, regardless of the value (object)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "65be7168bedb4f6dc2f19e2138bab232",
|
|
"grade": false,
|
|
"grade_id": "cell-6e904d692b5facad",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"SELECT ?entity ?prop\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "e78b57fa9baab578f5a4bd22dc499fca",
|
|
"grade": true,
|
|
"grade_id": "cell-3fc0d3c43dfd04a3",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(s['tuples']) >= 100 # There are at least 100 results\n",
|
|
"assert 'entity' in s['columns'] # A column named entity exists\n",
|
|
"assert 'http://learningsparql.com/ns/musician/RaymondBrown' in s['columns']['entity'] # RaymondBrown is an entity"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Getting a list of DISTINCT types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To get a better grip of the dataset, we will get a list of types.\n",
|
|
"\n",
|
|
"We may try to do so with a simple query: "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"SELECT ?type\n",
|
|
"WHERE {\n",
|
|
" ?entity a ?type\n",
|
|
"}\n",
|
|
"LIMIT 10"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"However, this list has many duplicates.\n",
|
|
"In fact, we only get one type (`Musician`).\n",
|
|
"\n",
|
|
"To remove duplicates, we will need the `DISTINCT` statement, which only shows unique (distinct) rows:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"SELECT DISTINCT ?type\n",
|
|
"WHERE {\n",
|
|
" ?entity a ?type\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We should see only three types now (`Musician`, `Song`, and `Instrument`)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert 'type' in solution()['columns']\n",
|
|
"assert len(solution()['tuples']) == 3\n",
|
|
"assert 'http://learningsparql.com/ns/schema/Musician' in solution()['columns']['type']\n",
|
|
"assert 'http://learningsparql.com/ns/schema/Song' in solution()['columns']['type']\n",
|
|
"assert 'http://learningsparql.com/ns/schema/Instrument' in solution()['columns']['type']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, **build a query to get the list of unique properties**:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "35563ff455c7e8b1c91f61db97b2011b",
|
|
"grade": false,
|
|
"grade_id": "cell-e615f9a77c4bc9a5",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"SELECT DISTINCT ?property\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "7603c90d8c177e2e6678baa2f1b6af36",
|
|
"grade": true,
|
|
"grade_id": "cell-9168718938ab7347",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert len(solution()['tuples']) == 182\n",
|
|
"assert 'http://learningsparql.com/ns/instrument/bass' in solution()['columns']['property']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Geting all properties for songs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The `WHERE` statement can contain more than one line.\n",
|
|
"\n",
|
|
"For example, we can restrict the list of properties from the previous exercise, to only get properties of musicians:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT DISTINCT ?prop\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Musician .\n",
|
|
" ?song ?prop ?value .\n",
|
|
"}\n",
|
|
"LIMIT 20"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"There should be two results:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert len(solution()['tuples']) == 2 # There are exactly two results"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Notice the use of prefixes, just like in turtle.\n",
|
|
"Also, these two options are equivalent:\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"?song a s:Musician ;\n",
|
|
" ?prop ?value .\n",
|
|
"\n",
|
|
"# And\n",
|
|
"\n",
|
|
"?song a s:Musician ;\n",
|
|
"?song ?prop ?value .\n",
|
|
"```\n",
|
|
"\n",
|
|
"The first one is just shorter to write.\n",
|
|
"\n",
|
|
"Alternatively, in this example we can also replace the properties we are not using with square brackets `[]`:\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"[] a s:Musician ;\n",
|
|
" ?prop [] .\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, use the same concepts to get a list of **songs and properties**, without duplicates:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "069811507dbac4b86dc5d3adc82ba4ec",
|
|
"grade": false,
|
|
"grade_id": "cell-0223a51f609edcf9",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"LIMIT 20"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "9833a3efa75c7e2784ef5d60aae2a13e",
|
|
"grade": true,
|
|
"grade_id": "cell-3c7943c6382c62f5",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(set(s['tuples'])) == len(s['tuples']) # There are no duplicates\n",
|
|
"assert len(s['tuples']) >= 20"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Getting a list of song names"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In the previous exercise, we saw the properties for Songs.\n",
|
|
"One of them is `rdfs:label`, which gives a human readable name for the entity.\n",
|
|
"\n",
|
|
"Using `rdfs:label`, get a list of song names:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "b68a279085a1ed087f5e474a6602299e",
|
|
"grade": false,
|
|
"grade_id": "cell-8f43547dd788bb33",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?name\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"LIMIT 20"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "b4461d243cc058b1828769cc906d4947",
|
|
"grade": true,
|
|
"grade_id": "cell-e13a1c921af2f6eb",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert 'Besame Mucho' in s['columns']['name']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Getting an ordered list of songs (ORDER BY)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The `ORDER BY` statement allows us to determine the way results will be sorted.\n",
|
|
"This makes it easier to find errors, or missing data.\n",
|
|
"\n",
|
|
"The syntax is the following:\n",
|
|
"\n",
|
|
"```sparql\n",
|
|
"\n",
|
|
"SELECT *\n",
|
|
"WHERE { ... }\n",
|
|
"ORDER BY <variable> <variable> ... \n",
|
|
"... other statements like LIMIT ...\n",
|
|
"```\n",
|
|
"\n",
|
|
"The results can be sorted in ascending or descending order, and using several variables.\n",
|
|
"By default the results are ordered in ascending order, but you can indicate the order using an optional modifier (`ASC(<variable>)`, or `DESC(<variable>)`). \n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Use `ORDER BY` to get a list of songs in **descending order**:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "335403f01e484ce5563ff059e9764ff4",
|
|
"grade": false,
|
|
"grade_id": "cell-a0f0b9d9b05c9631",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?name\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"LIMIT 50"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "45530eb91cbc5b3fddcc93d96f07e579",
|
|
"grade": true,
|
|
"grade_id": "cell-bc012ca9d7ad2867",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(s['tuples']) >= 20\n",
|
|
"assert s['columns']['name'][0][0] > s['columns']['name'][-1]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get a list of musicians who collaborated in at least one song (Traversing the graph)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"From our inspection of the properties in previous exercises, we know that each song has a list of properties that link to musicians, and each musician has a name. For example:\n",
|
|
"\n",
|
|
"\n",
|
|
"```turtle\n",
|
|
"song:HeyJude a schema:Song ;\n",
|
|
" instrument:guitar musician:RingoStarr .\n",
|
|
"\n",
|
|
"musician:RingoStarr a schema:Musician ;\n",
|
|
" rdfs:label \"Ringo Starr\" .\n",
|
|
"```\n",
|
|
"\n",
|
|
"Using this structure, and the SPARQL statements you already know, get the **names** of all musicians that collaborated in at least one song.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "8fb253675d2e8510e2c6780b960721e5",
|
|
"grade": false,
|
|
"grade_id": "cell-523b963fa4e288d0",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT DISTINCT ?musician\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Song .\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
" \n",
|
|
"}\n",
|
|
"ORDER BY ?name"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "f4474b302bc2f634b3b2ee6e1c7e7257",
|
|
"grade": true,
|
|
"grade_id": "cell-aa9a4e18d6fda225",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert 'musician' in s['columns']\n",
|
|
"assert 'Paul McCartney' in s['columns']['musician']\n",
|
|
"assert 'Peter Coe' in s['columns']['musician']\n",
|
|
"assert len(solution()['tuples']) >= 200"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### In how many songs did Ringo collaborate? (COUNT)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"Results can be aggregated using different functions.\n",
|
|
"One of the simplest functions is `COUNT`.\n",
|
|
"The syntax for `COUNT` is:\n",
|
|
" \n",
|
|
"```sparql\n",
|
|
"SELECT (COUNT(?variable) as ?count_name)\n",
|
|
"```\n",
|
|
"\n",
|
|
"Use `COUNT` to get the number of songs in which Ringo collaborated. Your query should return a column named `number`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "c7b6620f5ba28b482197ab693cb7142a",
|
|
"grade": false,
|
|
"grade_id": "cell-e89d08031e30b299",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Song .\n",
|
|
" ?song ?instrument m:RingoStarr .\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "c90e1427d7e48d9ae8abab40ff92e3b0",
|
|
"grade": true,
|
|
"grade_id": "cell-903d2be00885e1d2",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert solution()['columns']['number'][0] == '412'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Getting the frequency of each instrument (GROUP BY)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Results can be grouped by one or more of the variables.\n",
|
|
"\n",
|
|
"Grouping is achieved with the `GROUP BY` statement. \n",
|
|
"The syntax for `GROUP BY` is:\n",
|
|
"\n",
|
|
" \n",
|
|
"```sparql\n",
|
|
"SELECT GROUP BY ?variable1 ?variable2 ...\n",
|
|
"```\n",
|
|
"\n",
|
|
"Once results are grouped, they can be aggregated using any aggregation function, such as `COUNT`.\n",
|
|
"\n",
|
|
"Using `GROUP BY` and `COUNT`, get the count of songs in which Ringo Starr has played each of the instruments:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "7556bacb20c1fbd059dec165c982908d",
|
|
"grade": false,
|
|
"grade_id": "cell-1429e4eb5400dbc7",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?instrument (COUNT(?song) as ?number)\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Song .\n",
|
|
" ?song ?instrument m:RingoStarr .\n",
|
|
"}\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"ORDER BY DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "34a8432e8d4cea70994c8214ed0e5eb6",
|
|
"grade": true,
|
|
"grade_id": "cell-907aaf6001e27e50",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(s['tuples']) == 37\n",
|
|
"assert s['columns']['number'][-1] == '1'\n",
|
|
"assert s['columns']['number'][0] == '233'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### How many different instruments are there in every song?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can use other keywords inside our aggregation.\n",
|
|
"For example, we could use `DISTINCT` to remove duplicates before aggregating.\n",
|
|
"\n",
|
|
"Here is an example, which shows the number of songs each musician collaborated in.\n",
|
|
"It has to use `DISTINCT` because some artists play multiple instruments in a song."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?artist (COUNT(DISTINCT ?song) as ?number)\n",
|
|
"WHERE {\n",
|
|
" ?artist a s:Musician .\n",
|
|
" ?song ?instrument ?artist .\n",
|
|
"}\n",
|
|
"GROUP BY ?artist\n",
|
|
"ORDER BY DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, use the same principle to get the count of **different** instruments in each song.\n",
|
|
"Some songs have several musicians playing the same instrument, but we only care about *different* instruments in each song.\n",
|
|
"\n",
|
|
"Use `?song` for the song and `?number` for the count.\n",
|
|
"\n",
|
|
"Take into consideration that instruments are entities of type `i:Instrument`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "3139d9b7e620266946ffe1ae0cf67581",
|
|
"grade": false,
|
|
"grade_id": "cell-ee208c762d00da9c",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"WHERE {\n",
|
|
" [] a s:Song ;\n",
|
|
" rdfs:label ?song ;\n",
|
|
" ?instrument ?musician .\n",
|
|
" \n",
|
|
"?instrument a s:Instrument .\n",
|
|
"}\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"ORDER BY DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "5abf6eb7a67ebc9f7612b876105c1960",
|
|
"grade": true,
|
|
"grade_id": "cell-ddeec32b8ac3d894",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert s['columns']['number'][0] == '25'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Who is the vocalist in every song? (using OPTIONAL)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this exercise, we will get a list of songs and their vocalists.\n",
|
|
"\n",
|
|
"We coul start with this query:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?song ?vocalist\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Song .\n",
|
|
" ?song i:vocals ?vocalist\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"However, there are some songs that do not have a vocalist (at least, in the dataset).\n",
|
|
"Those songs will not appear in the list above, because they do not match part of the `WHERE` clause.\n",
|
|
"\n",
|
|
"In these cases, we can specify optional values in a query using the `OPTIONAL` keyword.\n",
|
|
"When a set of clauses are inside an `OPTIONAL` group, the SPARQL endpoint will try to use them in the query.\n",
|
|
"If there are no results for that part of the query, the variables it specifies will not be bound (i.e. they will be empty).\n",
|
|
"\n",
|
|
"To exemplify this, we can use a property that **does not exist in the dataset**:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?song ?musician\n",
|
|
"WHERE {\n",
|
|
" ?song a s:Song .\n",
|
|
" OPTIONAL {\n",
|
|
" ?song i:a_made_up_instrument ?musician\n",
|
|
" }\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Although the property does not exist, the query will still return all the songs.\n",
|
|
"In the column for our instrument, it returns an empty value.\n",
|
|
"\n",
|
|
"Now, use the same concept, to get a list of the **names** of the vocalists (if any) in each song."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "3bc508872193750d57d07efbf334c212",
|
|
"grade": false,
|
|
"grade_id": "cell-dcd68c45c1608a28",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?song ?vocalist\n",
|
|
"WHERE {\n",
|
|
" ?s a s:Song .\n",
|
|
" ?s rdfs:label ?song .\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "69edef3121b8dfab385a00cd181c956f",
|
|
"grade": true,
|
|
"grade_id": "cell-1e706b9c1c1331bc",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert 'Paul McCartney' in s['columns']['vocalist']\n",
|
|
"assert 'Paul McCartney' in s['columns']['vocalist']\n",
|
|
"assert ('Besame Mucho', 'Paul McCartney') in s['tuples']\n",
|
|
"assert '' in s['columns']['vocalist'] # Some songs do not have a vocalist"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### What songs do not have a vocalist? (Bound)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we only want to list those songs that **do not** have a vocalist.\n",
|
|
"\n",
|
|
"To do so, we can copy the query from the previous exercise, and filter the results with the `BOUND` function.\n",
|
|
"\n",
|
|
"`BOUND` will return `true` if the variable has a value, and `false` otherwise.\n",
|
|
"\n",
|
|
"This is very useful for two purposes.\n",
|
|
"Firstly, it allows us to look for patterns that **do not occur** in the graph, such as missing properties.\n",
|
|
"For instance, we could search for the authors with missing birth information so we can add it.\n",
|
|
"Secondly, we can use bound in filters to get conditional filters.\n",
|
|
"\n",
|
|
"Add a filter below to only get songs without a vocalist:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "300df0a3cf9729dd4814b3153b2fedb4",
|
|
"grade": false,
|
|
"grade_id": "cell-0c7cc924a13d792a",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
|
"\n",
|
|
"SELECT ?song\n",
|
|
"WHERE {\n",
|
|
" ?s a s:Song .\n",
|
|
" ?s rdfs:label ?song .\n",
|
|
" OPTIONAL {\n",
|
|
" ?s i:vocals ?vocalist\n",
|
|
" }\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"LIMIT 100"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "22d6fcdb72a8b2c5ab496cdbb5e2740a",
|
|
"grade": true,
|
|
"grade_id": "cell-2541abc93ab4d506",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(s['tuples']) == 23"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Who played guitar OR bass in the most songs? (Advanced FILTER with GROUP)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this exercise, we want a table with the name of musicians that played either the guitar (`i:guitar`) or the bass (`i:bass`), the instrument they played, and the times they played it.\n",
|
|
"\n",
|
|
"If a musician played both instruments, it should appear twice."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "e4e898c8a16b8aa5865dfde2f6e68ec6",
|
|
"grade": false,
|
|
"grade_id": "cell-d750b6d64c6aa0a7",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"\n",
|
|
"SELECT ?musician ?instrument (COUNT(DISTINCT ?song) AS ?number)\n",
|
|
"WHERE {\n",
|
|
" ?song ?ins ?player .\n",
|
|
" ?ins rdfs:label ?instrument .\n",
|
|
" ?player rdfs:label ?musician .\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"\n",
|
|
"ORDER BY DESC(?instrument) DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert ('George Harrison', 'guitar', '27') in s['tuples']\n",
|
|
"assert ('Stuart Sutcliffe', 'bass', '3') in s['tuples']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Who played the most instruments? (Advanced FILTER II)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, count how many instruments each musician have played in a song.\n",
|
|
"\n",
|
|
"**Do not count lead (`i:vocals`) or backing vocals (`i:backingvocals`) as instruments**.\n",
|
|
"\n",
|
|
"Use `?musician` for the musician and `?number` for the count."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "fade6ab714376e0eabfa595dd6bd6a8b",
|
|
"grade": false,
|
|
"grade_id": "cell-2f5aa516f8191787",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"WHERE {\n",
|
|
" ?song ?ins ?player .\n",
|
|
" ?ins rdfs:label ?instrument .\n",
|
|
" ?player rdfs:label ?musician .\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}\n",
|
|
"GROUP BY ?musician\n",
|
|
"ORDER BY DESC(?instrument) DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "33e93ec2a3d1f9eb4b0310d4651b74c2",
|
|
"grade": true,
|
|
"grade_id": "cell-bcd0f7e26b6c11c2",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert ('John Lennon', '52') in s['tuples']\n",
|
|
"assert ('Andy White', '2') in s['tuples']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Which songs had Ringo in drums OR Lennon in lead vocals? (UNION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can merge the results of several queries, just like using `JOIN` in SQL.\n",
|
|
"The keyword in SPARQL is `UNION`, because we are merging graphs.\n",
|
|
"\n",
|
|
"`UNION` is useful in many situations.\n",
|
|
"For instance, when there are equivalent properties, or when you want to use two search terms and FILTER would be too inefficient.\n",
|
|
"\n",
|
|
"The syntax is as follows:\n",
|
|
"\n",
|
|
"```sparql\n",
|
|
"SELECT ?title\n",
|
|
"WHERE {\n",
|
|
" { ?book dc10:title ?title }\n",
|
|
" UNION\n",
|
|
" { ?book dc11:title ?title }\n",
|
|
" \n",
|
|
" ... REST OF YOUR QUERY ...\n",
|
|
"\n",
|
|
"}\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "09262d81449c498c37e4b9d9b1dcdfed",
|
|
"grade": false,
|
|
"grade_id": "cell-d3a742bd87d9c793",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"\n",
|
|
"SELECT DISTINCT ?song\n",
|
|
"WHERE {\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "11061e79ec06ccb3a9c496319a528366",
|
|
"grade": true,
|
|
"grade_id": "cell-409402df0e801d09",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert len(solution()['tuples']) == 209"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### In how many songs has each musician collaborated at least 10 times? (HAVING)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can filter results after an aggregation, using the `HAVING` statement.\n",
|
|
"Its syntax is:\n",
|
|
" \n",
|
|
"\n",
|
|
"```sparql\n",
|
|
"SELECT ...\n",
|
|
"WHERE ...\n",
|
|
"GROUP BY ...\n",
|
|
"HAVING (<statement>)\n",
|
|
"```\n",
|
|
"\n",
|
|
"e.g.\n",
|
|
"\n",
|
|
"```sparql\n",
|
|
"HAVING (?count > 10)\n",
|
|
"```\n",
|
|
"\n",
|
|
"Use this new statement to get the list of artists that played at least 10 times with the Beatlest, and the number of times they did:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "9ddd2d1f50f841b889bfd29b175d06da",
|
|
"grade": false,
|
|
"grade_id": "cell-9d1ec854eb530235",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"\n",
|
|
"SELECT ?musician (COUNT(DISTINCT ?song) AS ?number) \n",
|
|
"WHERE {\n",
|
|
" ?song ?instrument [\n",
|
|
" rdfs:label ?musician \n",
|
|
" ]\n",
|
|
"}\n",
|
|
"GROUP BY ?musician\n",
|
|
"# YOUR ANSWER HERE\n",
|
|
"ORDER BY DESC(?number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"editable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "0ea5496acd1c3edd9e188b351690a533",
|
|
"grade": true,
|
|
"grade_id": "cell-a79c688b4566dbe8",
|
|
"locked": true,
|
|
"points": 1,
|
|
"schema_version": 3,
|
|
"solution": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = solution()\n",
|
|
"assert len(s['tuples']) == 7\n",
|
|
"assert s['columns']['musician'][0] == 'Paul McCartney'\n",
|
|
"assert s['columns']['musician'][-1] == 'Mal Evans'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## **Optional** exercises"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"These are additional exercises that can be solved with more advanced concepts.\n",
|
|
"\n",
|
|
"If you are curious, you could also check the notebook on Advanced SPARQL concepts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### What instruments could each musician play? (GROUP_CONCAT)\n",
|
|
"\n",
|
|
"\n",
|
|
"Another option to aggregate results is to concatenate them.\n",
|
|
"You can do so with:\n",
|
|
"\n",
|
|
"```sparql\n",
|
|
"GROUP_CONCAT(?name; separator=\",\")\n",
|
|
"```\n",
|
|
"\n",
|
|
"Using `GROUP_CONCAT`, get a list of the instruments that each musician could play.\n",
|
|
"\n",
|
|
"You can consult how to use GROUP_CONCAT [here](https://www.w3.org/TR/sparql11-query/).\n",
|
|
"\n",
|
|
"Use `?musician` for the musician and `?instruments` for the list of instruments."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "d18e8b6e1d32aed395a533febb29fcb5",
|
|
"grade": false,
|
|
"grade_id": "cell-7ea1f5154cdd8324",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### What types of vocals are there? (REGEX)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In one of the exercises, we excluded lead and backing vocals from the list of instruments.\n",
|
|
"However, are those the only types of vocals?\n",
|
|
"\n",
|
|
"You can check if a string or URI matches a regular expression with `regex(?variable, \"<regex>\", \"i\")`.\n",
|
|
"\n",
|
|
"The documentation for regular expressions in SPARQL is [here](https://www.w3.org/TR/rdf-sparql-query/).\n",
|
|
"\n",
|
|
"Use `?instrument` for the instrument and `?ins` for the url of the type."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"deletable": false,
|
|
"nbgrader": {
|
|
"cell_type": "code",
|
|
"checksum": "f926fa3a3568d122454a12312859cda1",
|
|
"grade": false,
|
|
"grade_id": "cell-b6bee887a1b1fc60",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%sparql http://fuseki.gsi.upm.es/sitc/\n",
|
|
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
|
|
"PREFIX s: <http://learningsparql.com/ns/schema/>\n",
|
|
"PREFIX i: <http://learningsparql.com/ns/instrument/>\n",
|
|
"PREFIX m: <http://learningsparql.com/ns/musician/>\n",
|
|
"\n",
|
|
"# YOUR ANSWER HERE"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## References"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"* [SPARQL queries of Beatles recording sessions](http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html)\n",
|
|
"* [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).\n",
|
|
"* [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Licence\n",
|
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
|
"\n",
|
|
"© Universidad Politécnica de Madrid."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.10"
|
|
},
|
|
"toc": {
|
|
"base_numbering": 1,
|
|
"nav_menu": {},
|
|
"number_sections": true,
|
|
"sideBar": true,
|
|
"skip_h1_title": false,
|
|
"title_cell": "Table of Contents",
|
|
"title_sidebar": "Contents",
|
|
"toc_cell": false,
|
|
"toc_position": {},
|
|
"toc_section_display": true,
|
|
"toc_window_display": false
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|