1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-05 07:31:41 +00:00
sitc/rdf/RDF.ipynb

1036 lines
30 KiB
Plaintext
Raw Normal View History

2019-02-13 16:51:18 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "1fba29f718bbaa14890b305223712474",
"grade": false,
"grade_id": "cell-2bd9e19ffed99f81",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"<header style=\"width:100%;position:relative\">\n",
" <div style=\"width:80%;float:right;\">\n",
" <h1>Course Notes for Learning Intelligent Systems</h1>\n",
" <h3>Department of Telematic Engineering Systems</h3>\n",
" <h5>Universidad Politécnica de Madrid</h5>\n",
" </div>\n",
" <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
"</header>"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "845cf125f1c5eb7aa3653ef461bffc67",
2019-02-13 16:51:18 +00:00
"grade": false,
"grade_id": "cell-51338a0933103db9",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"# Introduction\n",
"\n",
"The goal of this exercise is to understand the usefulness of semantic annotation and the Linked Open Data initiative, by solving a practical use case.\n",
"\n",
"The student will achieve the goal through:\n",
"\n",
"* Analyzing the sequence of tasks required to generate and publish semantic data\n",
"* Extending their knowledge using the set of additional documents and specifications\n",
"* Creating a partial semantic definition using the Turtle format\n",
"\n",
"\n",
"# Objectives\n",
"\n",
"The main objective is to learn how annotations can be unified on the web, by following the Linked Data principles.\n",
"\n",
"\n",
"These concepts will be applied in a practical use case: obtaining a Graph of information about hotels and reviews about them.\n",
"\n",
"\n",
"# Tools\n",
"\n",
"This notebook is self-contained, but it requires some python libraries.\n",
2019-02-14 10:38:48 +00:00
"To install them, simply run the following line:"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "code",
2019-02-13 18:15:07 +00:00
"execution_count": null,
2019-02-14 17:10:50 +00:00
"metadata": {},
2019-02-13 18:15:07 +00:00
"outputs": [],
2019-02-13 16:51:18 +00:00
"source": [
2019-02-14 17:10:50 +00:00
"# Install a pip package in the current Jupyter kernel.\n",
2019-02-13 18:15:07 +00:00
"import sys\n",
2019-02-14 17:10:50 +00:00
"import site\n",
"usersite = site.getusersitepackages()\n",
"if usersite not in sys.path:\n",
" sys.path.append(usersite)\n",
2019-02-14 16:18:16 +00:00
"!{sys.executable} -m pip install --user -r requirements.txt"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
2019-02-13 18:19:59 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "5c2cccdf6463262a0f8bc89c64f97fb2",
"grade": false,
"grade_id": "cell-d8db3c16cee92ac1",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"# Linked Data, RDF and Turtle\n",
"\n",
"\n",
"The term [Linked Data](https://www.w3.org/wiki/LinkedData) refers to a set of best practices for publishing structured data on the Web.\n",
"These principles have been coined by Tim Berners-Lee in the design issue note Linked Data.\n",
"The principles are:\n",
"\n",
"1. Use URIs as names for things\n",
"2. Use HTTP URIs so that people can look up those names\n",
"3. When someone looks up a URI, provide useful information\n",
"4. Include links to other URIs, so that they can discover more things\n",
"\n",
"The [RDF](https://www.w3.org/RDF/) is a standard model for data interchange on the Web.\n",
"It formalizes some concepts behind Linked Data into a specification, which can be used to develop applications and store information.\n",
"\n",
"Explaining RDF is out of the scope of this notebook.\n",
"The [resources section](#Useful-resources) contains some links if you wish to learn about RDF.\n",
"\n",
"The main idea behind RDF is that information is encoded in the form of triples:\n",
"\n",
"```turtle\n",
"<subject> <predicate> <object>\n",
"```\n",
"\n",
"Each of these, (`<subject>`, `<predicate>` and `<object>`) should be unique identifiers.\n",
"\n",
2019-02-13 17:01:14 +00:00
"For example, to say Timmy is a 7 year-old whose dog is Tobby, we would write:\n",
2019-02-13 16:51:18 +00:00
"\n",
"```turtle\n",
"<http://example.org/Timmy> <http://example.org/hasDog> <http://example.org/Tobby>\n",
"<http://example.org/Timmy> <http://example.org/age> 7\n",
"```\n",
"\n",
"Note that we are not referring to \"any Timmy\", but to a *very specific* Timmy.\n",
"We could learn more about this particular boy using that URI.\n",
"The same goes for the dog, and for the concept of \"having a dog\", which we unambiguously encode as `<http://example.org/hasDog>`.\n",
"This concept may be described as taking care of a dog, for example, whereas a different property `<http://yourwebsite.com/hasDog>` could be described as being the legal owner of the dog.\n",
"\n",
"\n",
"RDF can be used to embed annotation in many places, including HTML document, using any compatible format.\n",
"The options include including RDFa, XML, JSON-LD and [Turtle](https://www.w3.org/TR/turtle/).\n",
"\n",
"\n",
"In the exercises, we will be using turtle notation, because it is very readable.\n",
"\n",
"Here's an example of document in Turtle, taken from the Turtle specification:\n",
"\n",
"```turtle\n",
"@base <http://example.org/> .\n",
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
"@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
"@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
"@prefix rel: <http://www.perceive.net/schemas/relationship/> .\n",
"\n",
"<#green-goblin>\n",
" rel:enemyOf <#spiderman> ;\n",
" a foaf:Person ; # in the context of the Marvel universe\n",
" foaf:name \"Green Goblin\" .\n",
"\n",
"<#spiderman>\n",
" rel:enemyOf <#green-goblin> ;\n",
" a foaf:Person ;\n",
" foaf:name \"Spiderman\", \"Человек-паук\"@ru .\n",
"```\n",
"\n",
"\n",
"The second exercise will show you how to extract this information from any website.\n",
"\n",
"As you can observe in these examples, Turtle defines several ways to specify IRIs in a result. Please, consult the specification for further details. As an overview, IRIs can be:\n",
" * *relative IRIs*: IRIs resolved relative to the current base IRI. Thus, you should define a base IRI (@base <http://example.org>) and then relative IRIs (i.e. <#spiderman>). The resulting IRI is <http://example.org/spiderman>.\n",
" * *prefixed names*: a prefixed name (i.e. foaf:Person) is transformed into an IRI by concatenating the IRI of the prefix (@prefix foaf: <http://xmlns.com/foaf/0.1) and the local part of the prefixed name (i.e. Person). So, the resulting IRI is <http://xmlns.com/foaf/0.1/Person\n",
" * *absolute IRIs*: an already resolved IRI, p.ej. <http://example.com/Auto>."
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
2019-02-13 18:19:59 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "d2849a6154d4807b405e6ec84601c231",
"grade": false,
"grade_id": "cell-14e2327285737802",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"# Vocabularies and schema.org\n",
"\n",
"Concepts (predicates, types, etc.) can be defined in vocabularies.\n",
"These vocabularies can be reused in several applications.\n",
"In the example above, we used the concept of person from an external vocabulary (`foaf:Person`, i.e. http://xmlns.com/foaf/0.1/Person).\n",
"That way, we do not need to redefine the concept of Person in every application.\n",
"There are several well known vocabularies, such as:\n",
"\n",
"* Dublin core, for metadata: http://dublincore.org/\n",
"* FOAF (Friend-of-a-friend) for social networks: http://www.foaf-project.org/\n",
"* SIOC for online communities: https://www.w3.org/Submission/sioc-spec/\n",
"\n",
"Using the same vocabularies also makes it easier to automatically process and classify information.\n",
"\n",
"\n",
"That was the motivation behind Schema.org, a collaboration between Google, Microsoft, Yahoo and Yandex.\n",
"They aim to provide schemas for structured data annotation of Web sites, e-mails, etc., which can be leveraged by search engines and other automated processes.\n",
"\n",
"They rely on RDF for representation, and provide a set of common vocabularies that can be shared by every web developer.\n",
"\n",
"\n",
"There are thousands of properties in the schema.org vocabulary, and they offer a very comprehensive documentation.\n",
"\n",
"As an example, this is the documentation for hotels:\n",
"\n",
"* List of properties for the Hotel type: https://schema.org/Hotel\n",
"* Documentation for hotels: https://schema.org/docs/hotels.html\n",
"\n",
"\n",
"You can use the documentation to find properties (e.g. `checkinTime`), as well as the type of that property (e.g. `Datetime`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "fe9a246ba580c71385e9b83d414a1216",
"grade": false,
"grade_id": "cell-a1b60daabb1a9d00",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "63879c425ec11742c95c728a578d109e",
"grade": false,
"grade_id": "cell-d9289e96b2b0f265",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Instructions"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "32f1f607adb584aaea9fb90ae4d805b5",
2019-02-13 16:51:18 +00:00
"grade": false,
"grade_id": "cell-bb418e9bae1fef1a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-14 10:38:48 +00:00
"First of all, run the line below to import everything you need for the exercises."
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "code",
2019-02-14 10:38:48 +00:00
"execution_count": null,
2019-02-13 16:51:18 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "892f8491591c25defdea5fdcdd289489",
2019-02-13 16:51:18 +00:00
"grade": false,
"grade_id": "cell-4a1b60bd9974bbb1",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-14 10:38:48 +00:00
"outputs": [],
2019-02-13 16:51:18 +00:00
"source": [
2019-02-14 10:38:48 +00:00
"from helpers import *"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "a855d3d63be5ea7f73fd85d645b21bfe",
2019-02-13 16:51:18 +00:00
"grade": true,
2019-02-14 10:38:48 +00:00
"grade_id": "cell-9ac392294d5708a1",
2019-02-13 16:51:18 +00:00
"locked": false,
"points": 0,
"schema_version": 1,
"solution": true
}
},
"source": [
2019-02-13 18:15:07 +00:00
"In these exercises, you have to fill in the parts marked:\n",
2019-02-13 16:51:18 +00:00
"\n",
"```\n",
"# YOUR ANSWER HERE\n",
2019-02-13 18:30:31 +00:00
"```\n",
2019-02-13 18:19:59 +00:00
"\n",
2019-02-14 10:38:48 +00:00
"Depending on the exercise, you might need to fill that part with a Turtle definition (first exercise), some python code (second exercise), or plain text."
2019-02-13 18:41:41 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "9a73f79f8f282874fb60011e6019e387",
2019-02-13 18:41:41 +00:00
"grade": false,
2019-02-14 10:38:48 +00:00
"grade_id": "cell-57f67d1e662b7f09",
2019-02-13 18:41:41 +00:00
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-14 10:38:48 +00:00
"Turtle is usually written in standalone files (e.g. `mydefinition.ttl`).\n",
"To write Turtle definitions inside notebook cells we will use a special magic command: `%%ttl`.\n",
"The command will check the Turtle syntax of your definition, and provide syntax highlighting.\n",
"\n"
2019-02-13 16:51:18 +00:00
]
},
{
2019-02-14 10:38:48 +00:00
"cell_type": "markdown",
2019-02-13 16:51:18 +00:00
"metadata": {
"deletable": false,
2019-02-14 10:38:48 +00:00
"editable": false,
2019-02-13 16:51:18 +00:00
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "18ad887c2f326ee59139b96860ce8893",
2019-02-13 16:51:18 +00:00
"grade": false,
2019-02-14 10:38:48 +00:00
"grade_id": "cell-16214ea73a9b689e",
"locked": true,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
2019-02-14 10:38:48 +00:00
"solution": false
2019-02-13 16:51:18 +00:00
}
},
"source": [
2019-02-14 10:38:48 +00:00
"## Example"
2019-02-13 16:51:18 +00:00
]
},
2019-02-13 18:15:07 +00:00
{
"cell_type": "markdown",
2019-02-13 16:51:18 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "ed2466715f57356f22ddeabfb101eb11",
"grade": false,
"grade_id": "cell-da88c2f8170436fe",
2019-02-13 16:51:18 +00:00
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-14 10:38:48 +00:00
"\n",
"To make sure everything works, let's try first with an example exercise.\n",
"\n",
"Execute the code below, without modification.\n",
"\n",
"The definitio **is empty but valid**, so the output will be `The turtle syntax is correct.`."
2019-02-13 18:30:31 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
2019-02-14 10:38:48 +00:00
"metadata": {
"deletable": false,
"nbgrader": {
"checksum": "69182e8fadb9c9751f76786e0fcb8803",
"grade": false,
"grade_id": "cell-808cfcbf3891f39f",
"locked": false,
"schema_version": 1,
"solution": true
}
},
2019-02-13 18:30:31 +00:00
"outputs": [],
"source": [
"%%ttl example\n",
2019-02-14 10:38:48 +00:00
"\n",
2019-02-13 18:30:31 +00:00
"# YOUR ANSWER HERE"
]
},
2019-02-14 10:38:48 +00:00
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "60a9934c544eee9fc2c3745c36beb049",
"grade": false,
"grade_id": "cell-1c2ca86de107dec3",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"However, the definition is empty, so the tests for that definition **should fail**.\n",
"\n",
"Try it yourself by running the following line:"
]
},
2019-02-13 18:30:31 +00:00
{
"cell_type": "code",
"execution_count": null,
2019-02-14 10:38:48 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "12b5c7170326932ff3c7e1688a5769b2",
"grade": false,
"grade_id": "cell-0154f8481bf393e8",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 18:30:31 +00:00
"outputs": [],
"source": [
2019-02-14 10:38:48 +00:00
"# This will check your definition for the example.\n",
"check('example')"
2019-02-13 18:30:31 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "b534d998c6d2e9f6bef8c2d88687a96b",
2019-02-13 18:30:31 +00:00
"grade": false,
2019-02-14 10:38:48 +00:00
"grade_id": "cell-adc7e6b7e96e8788",
2019-02-13 18:30:31 +00:00
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-14 10:38:48 +00:00
"Now copy/paste the code below into the definition (below the `# YOUR ANSWER HERE` part), execute it, and run the test code again.\n",
2019-02-13 18:30:31 +00:00
"\n",
"```turtle\n",
"@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
"@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .\n",
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
"\n",
"<http://purl.org/net/bsletten> \n",
" a foaf:Person;\n",
" foaf:interest <http://www.w3.org/2000/01/sw/>;\n",
" foaf:based_near [\n",
" geo:lat \"34.0736111\" ;\n",
" geo:lon \"-118.3994444\"\n",
" ] .\n",
2019-02-14 10:38:48 +00:00
"```\n",
"\n",
"If you copied the file right, the tests should pass."
2019-02-13 18:30:31 +00:00
]
},
2019-02-13 16:51:18 +00:00
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "67540252804835faea83d96aab87aa29",
2019-02-13 16:51:18 +00:00
"grade": false,
"grade_id": "cell-e73f1933742f7ab3",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-14 10:38:48 +00:00
"## Exercise 1: Definition of hotels and reviews"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2019-02-14 10:38:48 +00:00
"We will define some basic information about hotels, and some reviews.\n",
2019-02-13 16:51:18 +00:00
"This should be the same type of information that some aggregators (e.g. TripAdvisor) offer in their websites.\n",
"\n",
"Namely, you need to define at least two hotels (you may add more than one), with the following information:\n",
"* Description\n",
"* Address\n",
"* Contact information\n",
"* City and country (location)\n",
"* Email\n",
"* logo\n",
"* Opening hours\n",
"* Price range\n",
"* Amenities (optional)\n",
"* Geolocation (optional)\n",
"* Images (optional)\n",
"\n",
"You should also add at least three reviews about hotels, with the following information:\n",
"* Name of the user that reviewed the Hotel\n",
"* Rating\n",
"* Date\n",
"* Replies by other users (optional)\n",
"* Aspects rated in each review (cleanliness, staff, etc...) (optional)\n",
"* Information about the user (name, surname, date the account was created) (optional)\n",
"\n",
"\n",
"You can check any hotel website for inspiration, like this [review of a hotel in TripAdvisor](https://www.tripadvisor.es/Hotel_Review-g1437655-d1088667-Reviews-Hotel_Spa_La_Salve-Torrijos_Province_of_Toledo_Castile_La_Mancha.html)"
]
},
{
"cell_type": "markdown",
2019-02-14 11:10:09 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "d6f1bf2230282256e5fcb85dba0eef45",
"grade": false,
"grade_id": "cell-3241bf07ae153beb",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"To make sure we are following Principles 1 and 2, we should use URIs that can be queried.\n",
2019-02-13 18:15:07 +00:00
"For the sake of this exercise, you have use the made-up `http://example/sitc/` as base for our URIs.\n",
2019-02-13 16:51:18 +00:00
"Hence, the URIs of our hotels will look like this: `http://example/sitc/my-fancy-hotel`.\n",
"These URIs can not be queried, **and should not be used in real annotations**, but we will see how to fix that in a future exercise.\n",
"\n",
"We will use the vocabularies defined in https://schema.org e.g.:\n",
" * https://schema.org/Review defines properties about reviews\n",
" * https://schema.org/Hotel defines properties about hotels\n",
" \n",
"\n",
"Your definition has to be included in the following cell.\n",
"\n",
"So, your task is:\n",
"* Search the relevant properties of the vocabulary schema.org to represent the attributes of both reviews and hotels.\n",
"* Write two resources of type Hotel and three resources of type Review.\n",
"* Check that your syntax is correct, by executing your code in the cell below.\n",
"\n",
2019-02-13 16:51:18 +00:00
"**Tip**: Define the schema prefix first, to avoid repeating `<http://schema.org/...>`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"checksum": "44f8be14db3d3e42b5b85f0485206346",
"grade": false,
"grade_id": "definition",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"%%ttl hotel\n",
"\n",
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
"@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
"@prefix sitc: <http://example/sitc/> .\n",
"\n",
"\n",
"<http://example/sitc/GSIHOTEL> a <http://schema.org/Hotel> ;\n",
" <http://schema.org/description> \"This is just an example to get you started.\" .\n",
"\n",
"\n",
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
2019-02-14 11:10:09 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "e8ba71b32e6d4f15aef9dc7fe70387fe",
"grade": true,
"grade_id": "cell-2fb6e144a6691ede",
"locked": true,
"points": 10,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"outputs": [],
"source": [
2019-02-14 10:38:48 +00:00
"# This will check that your definition for the first exercise is correct.\n",
"check('hotel')"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
2019-02-14 11:10:09 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "75d90c9a83c694f61e51bd5c47a672d9",
"grade": false,
"grade_id": "cell-63a55e7b8b195d59",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"## Exercise 2: Explore existing data"
]
},
{
"cell_type": "markdown",
2019-02-14 11:10:09 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "23632182da48df109721378408e57f01",
"grade": false,
"grade_id": "cell-3843c3ce98a77c56",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"The goal of this exercise is to explore and compare annotations from existing websites.\n",
"\n",
"Semantic annotations are very useful on the web, because they allow `robots` to extract information about resources, and how they relate to other resources.\n",
"\n",
"For example, `schema.org` annotations on a website allow Google to show summaries and useful information (e.g. price and location of a hotel) in their results.\n",
"A similar technology powers their knowledge graph and the \"related search\". i.e. when you look for a famous actor, it will first show you their filmography, and a list of related actors.\n",
"\n",
"The information has to be provided using the official standards (RDF), to comply with the 3rd principle of linked data.\n",
"\n",
"To follow the 4<sup>th</sup> principle of linked data, the annotations should include links to known sources (e.g. DBpedia) whenever possible."
]
},
{
"cell_type": "markdown",
2019-02-14 11:10:09 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "6c4b25718f493ad5964370f412519543",
"grade": false,
"grade_id": "cell-f42c087c9065bb23",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
"Let us explore some semantic annotations from popular websites.\n",
"\n",
"First, start with hotel reviews and websites. Here are some examples:\n",
"\n",
"* TripAdvisor hotels\n",
"* Trivago\n",
"* Kayak\n",
"* Specific hotel reviews\n",
"\n",
"\n",
"These are just two examples:"
]
},
{
"cell_type": "code",
2019-02-13 18:15:07 +00:00
"execution_count": null,
2019-02-13 16:51:18 +00:00
"metadata": {},
2019-02-13 18:15:07 +00:00
"outputs": [],
2019-02-13 16:51:18 +00:00
"source": [
"print_data('http://www.hotellasalve.com/')"
]
},
{
"cell_type": "code",
2019-02-13 18:15:07 +00:00
"execution_count": null,
2019-02-13 16:51:18 +00:00
"metadata": {},
2019-02-13 18:15:07 +00:00
"outputs": [],
2019-02-13 16:51:18 +00:00
"source": [
"print_data('https://www.mandarinoriental.com/madrid/hotel-ritz/luxury-hotel')"
]
},
2019-02-13 18:15:07 +00:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"checksum": "c7af0b9af5a64773785cc890f2431c78",
"grade": true,
"grade_id": "cell-c2e5b58ea74e8276",
"locked": false,
"points": 1,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"# Try new sites here\n",
"\n",
"# YOUR ANSWER HERE"
]
},
2019-02-13 16:51:18 +00:00
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "a29112f51cc3299c7cae27841feb7410",
"grade": false,
"grade_id": "cell-9bf9c7d7516fae75",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Once you've extracted and analyzed different sources, answer the following questions:\n",
"\n",
"\n",
"### Questions:\n",
"\n",
"What type of data do they offer?"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "56a77e133b532997723bf2f8116389e4",
2019-02-13 16:51:18 +00:00
"grade": true,
"grade_id": "cell-17508ecf96884653",
"locked": false,
2019-02-13 18:15:07 +00:00
"points": 1,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
"solution": true
}
},
"source": [
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "9311bca044d7057c86dd753f5343e19b",
2019-02-13 16:51:18 +00:00
"grade": false,
"grade_id": "cell-d36826d6323c96e8",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
2019-02-13 18:15:07 +00:00
"What vocabularies and ontologies do they use?"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "ba7b24b557d627e2665ca31c75c24c23",
2019-02-13 16:51:18 +00:00
"grade": true,
"grade_id": "cell-17508ecf96884655",
"locked": false,
2019-02-13 18:15:07 +00:00
"points": 1,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
"solution": true
}
},
"source": [
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
2019-02-13 18:15:07 +00:00
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "853651c95cbcd69cd5f495f03d29d19a",
"grade": false,
"grade_id": "cell-e25a0db3fe8a6b4b",
"locked": true,
"schema_version": 1,
"solution": false
}
},
2019-02-13 16:51:18 +00:00
"source": [
2019-02-13 18:15:07 +00:00
"What properties and annotations do they have in common?"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "779f0f750508eb52b2d98b92689e426b",
2019-02-13 16:51:18 +00:00
"grade": true,
"grade_id": "cell-30797c9ac87cc7e1",
"locked": false,
2019-02-13 18:15:07 +00:00
"points": 1,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
"solution": true
}
},
"source": [
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
2019-02-13 18:15:07 +00:00
"editable": false,
2019-02-13 16:51:18 +00:00
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "f8fff644855ca50a5219598322aa9b32",
"grade": false,
"grade_id": "cell-33862c8e38173d9c",
"locked": true,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
2019-02-13 18:15:07 +00:00
"solution": false
2019-02-13 16:51:18 +00:00
}
},
"source": [
2019-02-13 18:15:07 +00:00
"What sites provide the most, or the most useful annotations?"
2019-02-13 16:51:18 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "8ef0ebd54eefb44ff7019f17f58be3ec",
2019-02-13 16:51:18 +00:00
"grade": true,
"grade_id": "cell-17508ecf96884657",
"locked": false,
2019-02-13 18:15:07 +00:00
"points": 1,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
"solution": true
}
},
"source": [
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "33e1ec78415c85a795e86211d88316c2",
"grade": false,
"grade_id": "cell-5f922dc14ad3236a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Are all properties from Exercise 1 given by the websites? What's missing?"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-13 18:15:07 +00:00
"checksum": "b5f208a95a8803e97f82c5f2cdf319dd",
2019-02-13 16:51:18 +00:00
"grade": true,
"grade_id": "answer-missing",
"locked": false,
2019-02-13 18:15:07 +00:00
"points": 1,
2019-02-13 16:51:18 +00:00
"schema_version": 1,
"solution": true
}
},
"source": [
"# YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "26eb04e562aa6c7d29efa8318982a337",
"grade": false,
"grade_id": "cell-7a3c1553c4d6a9b7",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Optional\n",
"\n",
"There is nothing special about review sites.\n",
"You can get information about any website.\n",
"\n",
"Verify this running checking:\n",
"\n",
"* News sites: e.g. https://edition.cnn.com/\n",
"* CMS: e.g. http://www.etsit.upm.es\n",
"* Twitter profiles: e.g. https://www.twitter.com/cif\n",
"* Mastodon (a Twitter alternative) profiles: e.g. https://mastodon.social/@Gargron/\n",
"* Twitter status pages: e.g. http://mobile.twitter.com/TBLInternetBot/status/1054438951237312514\n",
"* Mastodon (a Twitter alternative) status pages: e.g. https://mastodon.social/@Gargron/101202440923902326\n",
"* Wikipedia entries: e.g. https://es.wikipedia.org/wiki/Tim_Berners-Lee\n",
"* Facebook groups: e.g. https://www.facebook.com/universidadpolitecnicademadrid/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print_data('https://mastodon.social/@Gargron')"
]
},
2019-02-14 10:38:48 +00:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now try some new sites:"
]
},
2019-02-13 18:15:07 +00:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
2019-02-14 10:38:48 +00:00
"checksum": "bf8d215c42321236b783601e7d072a05",
2019-02-13 18:15:07 +00:00
"grade": true,
"grade_id": "cell-ff2413f45311f086",
"locked": false,
"points": 0,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"# YOUR ANSWER HERE"
]
},
2019-02-13 16:51:18 +00:00
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "cffc12120c51a7d994063f66d788570a",
"grade": false,
"grade_id": "cell-ec8df1a53c3d3f23",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"# Useful resources\n",
"\n",
"* TTL validator: http://ttl.summerofcode.be/\n",
"* RDF-turtle specification: https://www.w3.org/TR/turtle/\n",
"* Schema.org documentation: https://schema.org\n",
"* Wikipedia entry on the Turtle syntax: https://en.wikipedia.org/wiki/Turtle_(syntax)\n",
"* RDFLib, the most popular python library for RDF (we use it in the tests): https://rdflib.readthedocs.io/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bibliography\n",
"\n",
"* W3C website on Linked Data: https://www.w3.org/wiki/LinkedData\n",
"* W3C website on RDF: https://www.w3.org/RDF/\n",
"* Turtle W3C recommendation: https://www.w3.org/TR/turtle/"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2019-02-14 10:38:48 +00:00
"version": "3.7.2"
2019-02-13 16:51:18 +00:00
}
},
"nbformat": 4,
"nbformat_minor": 2
}