mirror of
https://github.com/gsi-upm/sitc
synced 2024-12-22 03:38:13 +00:00
Add SPARQL custom endpoint
This commit is contained in:
parent
fc07718ae8
commit
a4f8f69b19
459
lod/02_SPARQL_Custom_Endpoint.ipynb
Normal file
459
lod/02_SPARQL_Custom_Endpoint.ipynb
Normal file
@ -0,0 +1,459 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "7276f055a8c504d3c80098c62ed41a4f",
|
||||
"grade": false,
|
||||
"grade_id": "cell-0bfe38f97f6ab2d2",
|
||||
"locked": true,
|
||||
"schema_version": 1,
|
||||
"solution": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"<header style=\"width:100%;position:relative\">\n",
|
||||
" <div style=\"width:80%;float:right;\">\n",
|
||||
" <h1>Course Notes for Learning Intelligent Systems</h1>\n",
|
||||
" <h3>Department of Telematic Engineering Systems</h3>\n",
|
||||
" <h5>Universidad Politécnica de Madrid</h5>\n",
|
||||
" </div>\n",
|
||||
" <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
|
||||
"</header>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "42642609861283bc33914d16750b7efa",
|
||||
"grade": false,
|
||||
"grade_id": "cell-0cd673883ee592d1",
|
||||
"locked": true,
|
||||
"schema_version": 1,
|
||||
"solution": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"In the previous notebook, we learnt how to use SPARQL by querying DBpedia.\n",
|
||||
"\n",
|
||||
"In this notebook, we will use SPARQL on manually annotated data. The data was collected as part of a [previous exercise](../lod/).\n",
|
||||
"\n",
|
||||
"The goal is to try SPARQL with data annotated by users with limited knowledge of vocabularies and semantics, and to compare the experience with similar queries to a more structured dataset.\n",
|
||||
"\n",
|
||||
"Hence, there are two parts.\n",
|
||||
"First, you will query a set of graphs annotated by students of this course.\n",
|
||||
"Then, you will query a synthetic dataset that contains similar information."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "a3ecb4b300a5ab82376a4a8cb01f7e6b",
|
||||
"grade": false,
|
||||
"grade_id": "cell-10264483046abcc4",
|
||||
"locked": true,
|
||||
"schema_version": 1,
|
||||
"solution": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Objectives\n",
|
||||
"\n",
|
||||
"* Experiencing the usefulness of the Linked Open Data initiative by querying data from different RDF graphs and endpoints\n",
|
||||
"* Understanding the challenges in querying multiple sources, with different annotators.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "2fedf0d73fc90104d1ab72c3413dfc83",
|
||||
"grade": false,
|
||||
"grade_id": "cell-4f8492996e74bf20",
|
||||
"locked": true,
|
||||
"schema_version": 1,
|
||||
"solution": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"\n",
|
||||
"See [the SPARQL notebook](./01_SPARQL_Introduction.ipynb#Tools)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"editable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "c5f8646518bd832a47d71f9d3218237a",
|
||||
"grade": false,
|
||||
"grade_id": "cell-eb13908482825e42",
|
||||
"locked": true,
|
||||
"schema_version": 1,
|
||||
"solution": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Run this line to enable the `%%sparql` magic command."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from helpers import *"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Exercises\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Querying the manually annotated dataset will be slightly different from querying DBpedia.\n",
|
||||
"The main difference is that this dataset uses different graphs to separate the annotations from different students.\n",
|
||||
"\n",
|
||||
"**Each graph is a separate set of triples**.\n",
|
||||
"For this exercise, you could think of graphs as individual endpoints.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"First, let us get a list of graphs available:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotels\n",
|
||||
" \n",
|
||||
"SELECT ?g (COUNT(?s) as ?count) WHERE {\n",
|
||||
" GRAPH ?g {\n",
|
||||
" ?s ?p ?o\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"GROUP BY ?g\n",
|
||||
"ORDER BY desc(?count)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"You should see many graphs, with different triple counts.\n",
|
||||
"\n",
|
||||
"The biggest one should be http://fuseki.cluster.gsi.dit.upm.es/synthetic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once you have this list, you can query specific graphs like so:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotels\n",
|
||||
" \n",
|
||||
"SELECT *\n",
|
||||
"WHERE {\n",
|
||||
" GRAPH <http://fuseki.cluster.gsi.dit.upm.es/synthetic>{\n",
|
||||
" ?s ?p ?o .\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"LIMIT 10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are two exercises in this notebook.\n",
|
||||
"\n",
|
||||
"In each of them, you are asked to run five queries, to answer the following questions:\n",
|
||||
"\n",
|
||||
"* Number of hotels (or entities) with reviews\n",
|
||||
"* Number of reviews\n",
|
||||
"* The hotel with the lowest average score\n",
|
||||
"* The hotel with the highest average score\n",
|
||||
"* A list of hotels with their addresses and telephone numbers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Manually annotated data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Your task is to design five queries to answer the questions in the description, and run each of them in at least three graphs, other than the `synthetic` graph.\n",
|
||||
"\n",
|
||||
"To design the queries, you can either use what you know about the schema.org vocabularies, or explore subjects, predicates and objects in each of the graphs.\n",
|
||||
"\n",
|
||||
"You will get a better understanding if you follow the exploratory path.\n",
|
||||
"\n",
|
||||
"Here's an example to get the entities and their types in a graph:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotels\n",
|
||||
"\n",
|
||||
"PREFIX schema: <http://schema.org/>\n",
|
||||
" \n",
|
||||
"SELECT ?s ?o\n",
|
||||
"WHERE {\n",
|
||||
" GRAPH <http://fuseki.cluster.gsi.dit.upm.es/35c20a49f8c6581be1cf7bd56d12d131>{\n",
|
||||
" ?s a ?o .\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
"}\n",
|
||||
"LIMIT 40"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Synthetic dataset\n",
|
||||
"\n",
|
||||
"Now, run the same queries in the synthetic dataset.\n",
|
||||
"\n",
|
||||
"The query below should get you started:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotels\n",
|
||||
" \n",
|
||||
"SELECT *\n",
|
||||
"WHERE {\n",
|
||||
" GRAPH <http://fuseki.cluster.gsi.dit.upm.es/synthetic>{\n",
|
||||
" ?s ?p ?o .\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"LIMIT 10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Optional exercise\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Explore the graphs and find the most typical mistakes (e.g. using `http://schema.org/Hotel/Hotel`).\n",
|
||||
"\n",
|
||||
"Tip: You can use normal SPARQL queries with `BOUND` and `REGEX` to check if the annotations are correct.\n",
|
||||
"\n",
|
||||
"You can also query all the graphs at the same time. e.g. to get all types used:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%sparql http://fuseki.cluster.gsi.dit.upm.es/hotels\n",
|
||||
"\n",
|
||||
"PREFIX schema: <http://schema.org/>\n",
|
||||
" \n",
|
||||
"SELECT DISTINCT ?o\n",
|
||||
"WHERE {\n",
|
||||
" GRAPH ?g {\n",
|
||||
" ?s a ?o .\n",
|
||||
" }\n",
|
||||
" {\n",
|
||||
" SELECT ?g\n",
|
||||
" WHERE {\n",
|
||||
" GRAPH ?g {}\n",
|
||||
" FILTER (str(?g) != 'http://fuseki.cluster.gsi.dit.upm.es/synthetic')\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"}\n",
|
||||
"LIMIT 50"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Discussion\n",
|
||||
"\n",
|
||||
"Compare the results of the synthetic and the manual dataset, and answer these questions:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Both datasets should use the same schema. Are there any differences when it comes to using them?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "860c3977cd06736f1342d535944dbb63",
|
||||
"grade": true,
|
||||
"grade_id": "cell-9bd08e4f5842cb89",
|
||||
"locked": false,
|
||||
"points": 0,
|
||||
"schema_version": 1,
|
||||
"solution": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# YOUR ANSWER HERE"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Are the annotations used correctly in every graph?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "1946a7ed4aba8d168bb3fad898c05651",
|
||||
"grade": true,
|
||||
"grade_id": "cell-9dc1c9033198bb18",
|
||||
"locked": false,
|
||||
"points": 0,
|
||||
"schema_version": 1,
|
||||
"solution": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# YOUR ANSWER HERE"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Has any of the datasets been harder to query? If so, why?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"deletable": false,
|
||||
"nbgrader": {
|
||||
"checksum": "6714abc5226618b76dc4c1aaed6d1a49",
|
||||
"grade": true,
|
||||
"grade_id": "cell-6c18003ced54be23",
|
||||
"locked": false,
|
||||
"points": 0,
|
||||
"schema_version": 1,
|
||||
"solution": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# YOUR ANSWER HERE"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## References"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"* [RDFLib documentation](https://rdflib.readthedocs.io/en/stable/).\n",
|
||||
"* [Wikidata Query Service query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© 2018 Universidad Politécnica de Madrid."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Loading…
Reference in New Issue
Block a user