Add RDF/Turtle exercise

2025-08-23 02:02:20 +00:00 · 2019-02-13 17:51:18 +01:00
parent 8913c5ecde
commit a6670235ba
4 changed files with 1024 additions and 1 deletions
--- a/lod/SPARQL.ipynb
+++ b/lod/SPARQL.ipynb
@@ -1872,7 +1872,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.4"
+   "version": "3.7.2"
  }
 },
 "nbformat": 4,
--- a/rdf/RDF.ipynb
+++ b/rdf/RDF.ipynb
@@ -0,0 +1,875 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "1fba29f718bbaa14890b305223712474",
+     "grade": false,
+     "grade_id": "cell-2bd9e19ffed99f81",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "<header style=\"width:100%;position:relative\">\n",
+    "  <div style=\"width:80%;float:right;\">\n",
+    "    <h1>Course Notes for Learning Intelligent Systems</h1>\n",
+    "    <h3>Department of Telematic Engineering Systems</h3>\n",
+    "    <h5>Universidad Politécnica de Madrid</h5>\n",
+    "  </div>\n",
+    "        <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
+    "</header>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "59c5cb46c9d722f691206e766e5af557",
+     "grade": false,
+     "grade_id": "cell-51338a0933103db9",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "The goal of this exercise is to understand the usefulness of semantic annotation and the Linked Open Data initiative, by solving a practical use case.\n",
+    "\n",
+    "The student will achieve the goal through:\n",
+    "\n",
+    "* Analyzing the sequence of tasks required to generate and publish semantic data\n",
+    "* Extending their knowledge using the set of additional documents and specifications\n",
+    "* Creating a partial semantic definition using the Turtle format\n",
+    "\n",
+    "\n",
+    "# Objectives\n",
+    "\n",
+    "The main objective is to learn how annotations can be unified on the web, by following the Linked Data principles.\n",
+    "\n",
+    "\n",
+    "These concepts will be applied in a practical use case: obtaining a Graph of information about hotels and reviews about them.\n",
+    "\n",
+    "\n",
+    "# Tools\n",
+    "\n",
+    "This notebook is self-contained, but it requires some python libraries.\n",
+    "To install them, simply run the following line"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "387f9c38b548f29b56ae5ef5ae76fd4f",
+     "grade": false,
+     "grade_id": "cell-d7f1ea9c021693b8",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "!pip install --user -r requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Linked Data, RDF and Turtle\n",
+    "\n",
+    "\n",
+    "The term [Linked Data](https://www.w3.org/wiki/LinkedData) refers to a set of best practices for publishing structured data on the Web.\n",
+    "These principles have been coined by Tim Berners-Lee in the design issue note Linked Data.\n",
+    "The principles are:\n",
+    "\n",
+    "1. Use URIs as names for things\n",
+    "2. Use HTTP URIs so that people can look up those names\n",
+    "3. When someone looks up a URI, provide useful information\n",
+    "4. Include links to other URIs, so that they can discover more things\n",
+    "\n",
+    "The [RDF](https://www.w3.org/RDF/) is a standard model for data interchange on the Web.\n",
+    "It formalizes some concepts behind Linked Data into a specification, which can be used to develop applications and store information.\n",
+    "\n",
+    "Explaining RDF is out of the scope of this notebook.\n",
+    "The [resources section](#Useful-resources) contains some links if you wish to learn about RDF.\n",
+    "\n",
+    "The main idea behind RDF is that information is encoded in the form of triples:\n",
+    "\n",
+    "```turtle\n",
+    "<subject> <predicate> <object>\n",
+    "```\n",
+    "\n",
+    "Each of these, (`<subject>`, `<predicate>` and `<object>`) should be unique identifiers.\n",
+    "\n",
+    "For example, to say Timmy is a 6 year-old whose dog is Tobby, we would write:\n",
+    "\n",
+    "```turtle\n",
+    "<http://example.org/Timmy>  <http://example.org/hasDog> <http://example.org/Tobby>\n",
+    "<http://example.org/Timmy>  <http://example.org/age> 7\n",
+    "```\n",
+    "\n",
+    "Note that we are not referring to \"any Timmy\", but to a *very specific* Timmy.\n",
+    "We could learn more about this particular boy using that URI.\n",
+    "The same goes for the dog, and for the concept of \"having a dog\", which we unambiguously encode as `<http://example.org/hasDog>`.\n",
+    "This concept may be described as taking care of a dog, for example, whereas a different property `<http://yourwebsite.com/hasDog>` could be described as being the legal owner of the dog.\n",
+    "\n",
+    "\n",
+    "RDF can be used to embed annotation in many places, including HTML document, using any compatible format.\n",
+    "The options include including RDFa, XML, JSON-LD and [Turtle](https://www.w3.org/TR/turtle/).\n",
+    "\n",
+    "\n",
+    "In the exercises, we will be using turtle notation, because it is very readable.\n",
+    "\n",
+    "Here's an example of document in Turtle, taken from the Turtle specification:\n",
+    "\n",
+    "```turtle\n",
+    "@base <http://example.org/> .\n",
+    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
+    "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
+    "@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
+    "@prefix rel: <http://www.perceive.net/schemas/relationship/> .\n",
+    "\n",
+    "<#green-goblin>\n",
+    "    rel:enemyOf <#spiderman> ;\n",
+    "    a foaf:Person ;    # in the context of the Marvel universe\n",
+    "    foaf:name \"Green Goblin\" .\n",
+    "\n",
+    "<#spiderman>\n",
+    "    rel:enemyOf <#green-goblin> ;\n",
+    "    a foaf:Person ;\n",
+    "    foaf:name \"Spiderman\", \"Человек-паук\"@ru .\n",
+    "```\n",
+    "\n",
+    "\n",
+    "The second exercise will show you how to extract this information from any website."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Vocabularies and schema.org\n",
+    "\n",
+    "Concepts (predicates, types, etc.) can be defined in vocabularies.\n",
+    "These vocabularies can be reused in several applications.\n",
+    "In the example above, we used the concept of person from an external vocabulary (`foaf:Person`, i.e. http://xmlns.com/foaf/0.1/Person).\n",
+    "That way, we do not need to redefine the concept of Person in every application.\n",
+    "There are several well known vocabularies, such as:\n",
+    "\n",
+    "* Dublin core, for metadata: http://dublincore.org/\n",
+    "* FOAF (Friend-of-a-friend) for social networks: http://www.foaf-project.org/\n",
+    "* SIOC for online communities: https://www.w3.org/Submission/sioc-spec/\n",
+    "\n",
+    "Using the same vocabularies also makes it easier to automatically process and classify information.\n",
+    "\n",
+    "\n",
+    "That was the motivation behind Schema.org, a collaboration between Google, Microsoft, Yahoo and Yandex.\n",
+    "They aim to provide schemas for structured data annotation of Web sites, e-mails, etc., which can be leveraged by search engines and other automated processes.\n",
+    "\n",
+    "They rely on RDF for representation, and provide a set of common vocabularies that can be shared by every web developer.\n",
+    "\n",
+    "\n",
+    "There are thousands of properties in the schema.org vocabulary, and they offer a very comprehensive documentation.\n",
+    "\n",
+    "As an example, this is the documentation for hotels:\n",
+    "\n",
+    "* List of properties for the Hotel type: https://schema.org/Hotel\n",
+    "* Documentation for hotels: https://schema.org/docs/hotels.html\n",
+    "\n",
+    "\n",
+    "You can use the documentation to find properties (e.g. `checkinTime`), as well as the type of that property (e.g. `Datetime`)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "fe9a246ba580c71385e9b83d414a1216",
+     "grade": false,
+     "grade_id": "cell-a1b60daabb1a9d00",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "# Exercises"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "63879c425ec11742c95c728a578d109e",
+     "grade": false,
+     "grade_id": "cell-d9289e96b2b0f265",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "## Instructions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "e0b6464bce9263fb35543acf4acb31da",
+     "grade": false,
+     "grade_id": "cell-bb418e9bae1fef1a",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "First of all, run the line below.\n",
+    "It will import everything you need for the exercises."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "bf98cea45f42e3d0f1ab158693b40da7",
+     "grade": false,
+     "grade_id": "cell-4a1b60bd9974bbb1",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from helpers import *\n",
+    "from rdflib import term, RDF, Namespace"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "3e23398d5277f2db2b3b5fb84f9623d6",
+     "grade": true,
+     "grade_id": "cell-da88c2f8170436fe",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "You have to fill in the parts marked:\n",
+    "\n",
+    "```\n",
+    "# YOUR ANSWER HERE\n",
+    "```\n",
+    "\n",
+    "To make sure everything is working, try the following example.\n",
+    "The solution is:\n",
+    "\n",
+    "```turtle\n",
+    "@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
+    "@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .\n",
+    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
+    "\n",
+    "<http://purl.org/net/bsletten> \n",
+    "    a foaf:Person;\n",
+    "    foaf:interest <http://www.w3.org/2000/01/sw/>;\n",
+    "    foaf:based_near [\n",
+    "        geo:lat \"34.0736111\" ;\n",
+    "        geo:lon \"-118.3994444\"\n",
+    "   ] .\n",
+    "```\n",
+    "\n",
+    "Fill in the answer and run the test code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "69182e8fadb9c9751f76786e0fcb8803",
+     "grade": false,
+     "grade_id": "cell-808cfcbf3891f39f",
+     "locked": false,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%%ttl example\n",
+    "\n",
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "5982ca82090e267401af135ca1f371a8",
+     "grade": true,
+     "grade_id": "cell-23e61b9f48d597fc",
+     "locked": true,
+     "points": 1,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "g = solution('example')\n",
+    "test('Some triples have been loaded',\n",
+    "     len(g))\n",
+    "test('A person has been defined',\n",
+    "     g.subjects(RDF.type, term.URIRef('http://xmlns.com/foaf/0.1/Person')))\n",
+    "print('All tests passed. Well done!')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "a64acf02625b48b3c65b6e1bc1ba6c1a",
+     "grade": false,
+     "grade_id": "cell-e73f1933742f7ab3",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "## Exercise 1: Definition of a Hotel"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will define some basic information about a hotel, and some reviews.\n",
+    "This should be the same type of information that some aggregators (e.g. TripAdvisor) offer in their websites.\n",
+    "\n",
+    "Namely, you need to define at least two hotels (you may add more than one), with the following information:\n",
+    "* Description\n",
+    "* Address\n",
+    "* Contact information\n",
+    "* City and country (location)\n",
+    "* Email\n",
+    "* logo\n",
+    "* Opening hours\n",
+    "* Price range\n",
+    "* Amenities (optional)\n",
+    "* Geolocation (optional)\n",
+    "* Images (optional)\n",
+    "\n",
+    "You should also add at least three reviews about hotels, with the following information:\n",
+    "* Name of the user that reviewed the Hotel\n",
+    "* Rating\n",
+    "* Date\n",
+    "* Replies by other users (optional)\n",
+    "* Aspects rated in each review (cleanliness, staff, etc...) (optional)\n",
+    "* Information about the user (name, surname, date the account was created) (optional)\n",
+    "\n",
+    "\n",
+    "You can check any hotel website for inspiration, like this [review of a hotel in TripAdvisor](https://www.tripadvisor.es/Hotel_Review-g1437655-d1088667-Reviews-Hotel_Spa_La_Salve-Torrijos_Province_of_Toledo_Castile_La_Mancha.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To make sure we are following Principles 1 and 2, we should use URIs that can be queried.\n",
+    "For the sake of this exercise, you have two options:\n",
+    "    \n",
+    "* Use the made-up `http://example/sitc/` as base for our URIs.\n",
+    "Hence, the URIs of our hotels will look like this: `http://example/sitc/my-fancy-hotel`.\n",
+    "These URIs can not be queried, **and should not be used in real annotations**, but we will see how to fix that in a future exercise.\n",
+    "* Use (blank nodes)[https://en.wikipedia.org/wiki/Blank_node] (e.g. `_:my-fancy-hotel`), which cannot be used by other people, but can be re-used in your annotations.\n",
+    "\n",
+    "\n",
+    "We will use the vocabularies defined in https://schema.org e.g.:\n",
+    "    * https://schema.org/Review defines properties about reviews\n",
+    "    * https://schema.org/Hotel defines properties about hotels\n",
+    "    \n",
+    "\n",
+    "Your definition has to be included in the following cell.\n",
+    "\n",
+    "**Tip**: Define the schema prefix first, to avoid repeating `<http://schema.org/...>`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "44f8be14db3d3e42b5b85f0485206346",
+     "grade": false,
+     "grade_id": "definition",
+     "locked": false,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%%ttl hotel\n",
+    "\n",
+    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
+    "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
+    "@prefix sitc: <http://example/sitc/> .\n",
+    "\n",
+    "\n",
+    "<http://example/sitc/GSIHOTEL> a <http://schema.org/Hotel> ;\n",
+    "         <http://schema.org/description> \"This is just an example to get you started.\" .\n",
+    "\n",
+    "\n",
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "4f54963163a64f46058c86be139e5543",
+     "grade": true,
+     "grade_id": "definition-tests",
+     "locked": true,
+     "points": 10,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "g = solution('hotel')\n",
+    "test('Some triples are loaded',\n",
+    "     len(g))\n",
+    "\n",
+    "hotels = set(g.subjects(RDF.type, schema['Hotel']))\n",
+    "test('At least 2 hotels are loaded',\n",
+    "     hotels,\n",
+    "     2,\n",
+    "     atLeast)\n",
+    "\n",
+    "for hotel in hotels:\n",
+    "    if 'GSIHOTEL' in hotel:  # Do not check the example hotel\n",
+    "        continue\n",
+    "    props = g.predicates(hotel)\n",
+    "    test('Each hotel has all required properties',\n",
+    "         props,\n",
+    "         list(schema[i] for i in ['description', 'email', 'logo', 'priceRange']),\n",
+    "         func=containsAll)\n",
+    "\n",
+    "reviews = set(g.subjects(RDF.type, schema['Review']))\n",
+    "test('At least 3 reviews are loaded',\n",
+    "     reviews,\n",
+    "     3,\n",
+    "     atLeast)\n",
+    "\n",
+    "for review in reviews:\n",
+    "    props = g.predicates(review)\n",
+    "    test('Each review has all required properties',\n",
+    "         props,\n",
+    "         list(schema[i] for i in ['itemReviewed', 'reviewBody', 'reviewRating']),\n",
+    "         func=containsAll)\n",
+    "    ratings = list(g.objects(review, schema['reviewRating']))\n",
+    "    for rating in ratings:\n",
+    "        value = g.value(rating, schema['ratingValue'])\n",
+    "        test('The review should have ratings', value)\n",
+    "\n",
+    "authors = set(g.objects(None, schema['author']))\n",
+    "for author in authors:\n",
+    "    for prop in g.predicates(author, None):\n",
+    "        if 'name' in str(prop).lower():\n",
+    "            break\n",
+    "else:\n",
+    "    assert \"At least a reviewer has a name (surname, givenName...)\"\n",
+    "\n",
+    "print('All tests passed. Congratulations!')\n",
+    "print()\n",
+    "print('Now you can try to add the optional properties')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Explore existing data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The goal of this exercise is to explore and compare annotations from existing websites.\n",
+    "\n",
+    "Semantic annotations are very useful on the web, because they allow `robots` to extract information about resources, and how they relate to other resources.\n",
+    "\n",
+    "For example, `schema.org` annotations on a website allow Google to show summaries and useful information (e.g. price and location of a hotel) in their results.\n",
+    "A similar technology powers their knowledge graph and the \"related search\". i.e. when you look for a famous actor, it will first show you their filmography, and a list of related actors.\n",
+    "\n",
+    "The information has to be provided using the official standards (RDF), to comply with the 3rd principle of linked data.\n",
+    "\n",
+    "To follow the 4<sup>th</sup> principle of linked data, the annotations should include links to known sources (e.g. DBpedia) whenever possible."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us explore some semantic annotations from popular websites.\n",
+    "\n",
+    "First, start with hotel reviews and websites. Here are some examples:\n",
+    "\n",
+    "* TripAdvisor hotels\n",
+    "* Trivago\n",
+    "* Kayak\n",
+    "* Specific hotel reviews\n",
+    "\n",
+    "\n",
+    "These are just two examples:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print_data('http://www.hotellasalve.com/')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print_data('https://www.mandarinoriental.com/madrid/hotel-ritz/luxury-hotel')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "a29112f51cc3299c7cae27841feb7410",
+     "grade": false,
+     "grade_id": "cell-9bf9c7d7516fae75",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "Once you've extracted and analyzed different sources, answer the following questions:\n",
+    "\n",
+    "\n",
+    "### Questions:\n",
+    "\n",
+    "What type of data do they offer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "2a7a6ab7d69f7ca5db64233128260045",
+     "grade": true,
+     "grade_id": "cell-17508ecf96884653",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "531928c9e3b8462baddd4d700c240995",
+     "grade": false,
+     "grade_id": "cell-d36826d6323c96e8",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "What vocabularyes and ontologies do they use?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "f100004eceae0c8159ade9d713af47e7",
+     "grade": true,
+     "grade_id": "cell-17508ecf96884655",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What are the similarities between sites"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "3d9ad086580ee27d93395dac8c16551d",
+     "grade": true,
+     "grade_id": "cell-30797c9ac87cc7e1",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What are the similarities between sites"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "4c03ad45eb1234cadccab2b468a69123",
+     "grade": true,
+     "grade_id": "answer-similarities",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What are the biggest differences"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "6ccc2db2be4826a146a6c34bc54f00de",
+     "grade": true,
+     "grade_id": "cell-17508ecf96884657",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "33e1ec78415c85a795e86211d88316c2",
+     "grade": false,
+     "grade_id": "cell-5f922dc14ad3236a",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "Are all properties from Exercise 1 given by the websites? What's missing?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "checksum": "e0b4d9f1a2dfe5a7ab835f7349aa3796",
+     "grade": true,
+     "grade_id": "answer-missing",
+     "locked": false,
+     "points": 0,
+     "schema_version": 1,
+     "solution": true
+    }
+   },
+   "source": [
+    "# YOUR ANSWER HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "26eb04e562aa6c7d29efa8318982a337",
+     "grade": false,
+     "grade_id": "cell-7a3c1553c4d6a9b7",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "## Optional\n",
+    "\n",
+    "There is nothing special about review sites.\n",
+    "You can get information about any website.\n",
+    "\n",
+    "Verify this running checking:\n",
+    "\n",
+    "* News sites: e.g. https://edition.cnn.com/\n",
+    "* CMS: e.g. http://www.etsit.upm.es\n",
+    "* Twitter profiles: e.g. https://www.twitter.com/cif\n",
+    "* Mastodon (a Twitter alternative) profiles: e.g. https://mastodon.social/@Gargron/\n",
+    "* Twitter status pages: e.g. http://mobile.twitter.com/TBLInternetBot/status/1054438951237312514\n",
+    "* Mastodon (a Twitter alternative) status pages: e.g. https://mastodon.social/@Gargron/101202440923902326\n",
+    "* Wikipedia entries: e.g. https://es.wikipedia.org/wiki/Tim_Berners-Lee\n",
+    "* Facebook groups: e.g. https://www.facebook.com/universidadpolitecnicademadrid/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print_data('https://mastodon.social/@Gargron')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "checksum": "cffc12120c51a7d994063f66d788570a",
+     "grade": false,
+     "grade_id": "cell-ec8df1a53c3d3f23",
+     "locked": true,
+     "schema_version": 1,
+     "solution": false
+    }
+   },
+   "source": [
+    "# Useful resources\n",
+    "\n",
+    "* TTL validator: http://ttl.summerofcode.be/\n",
+    "* RDF-turtle specification: https://www.w3.org/TR/turtle/\n",
+    "* Schema.org documentation: https://schema.org\n",
+    "* Wikipedia entry on the Turtle syntax: https://en.wikipedia.org/wiki/Turtle_(syntax)\n",
+    "* RDFLib, the most popular python library for RDF (we use it in the tests): https://rdflib.readthedocs.io/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bibliography\n",
+    "\n",
+    "* W3C website on Linked Data: https://www.w3.org/wiki/LinkedData\n",
+    "* W3C website on RDF: https://www.w3.org/RDF/\n",
+    "* Turtle W3C recommendation: https://www.w3.org/TR/turtle/"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/rdf/helpers.py
+++ b/rdf/helpers.py
@@ -0,0 +1,148 @@
+import sys
+import operator
+import types
+from future.standard_library import install_aliases
+install_aliases()
+
+from urllib import request, parse
+from rdflib import Graph, term, Namespace, BNode
+from lxml import etree
+
+import IPython
+js = "IPython.CodeCell.options_default.highlight_modes['magic_turtle'] = {'reg':[/^%%ttl/]};"
+IPython.core.display.display_javascript(js, raw=True)
+
+
+from IPython.core.magic import (register_line_magic, register_cell_magic,
+                                register_line_cell_magic)
+from IPython.display import HTML, display, Image, Markdown
+
+
+schema = Namespace('http://schema.org/')
+
+DEFINITIONS = {}
+
+def solution(exercise='default'):
+    if exercise not in DEFINITIONS:
+        raise Exception('Solution for {} not found'.format(exercise))
+    return DEFINITIONS[exercise]
+
+
+@register_cell_magic
+def ttl(line, cell):
+    '''
+    TTL magic command for ipython. It can be used in a cell like this:
+    
+    ```
+    %%ttl
+    
+    ... Your TTL definition ...
+    
+    ```
+    The definition will be loaded into a DEFINITION variable, using RDFlib.
+    This definition can then be used for evaluation.
+    '''
+    g = Graph()
+    msg = '''Error on line {line}
+
+Reason: {reason}
+
+If you don\'t know what this error means, try an online validator: http://ttl.summerofcode.be/
+'''
+    global DEFINITIONS
+    key = line or 'default'
+    try:
+        DEFINITIONS[key] = g.parse(data=cell,
+                                  format="ttl")
+    except SyntaxError as ex:
+        return Markdown(msg.format(line=ex.lines, reason=ex._why))
+    except Exception as ex:
+        return Markdown(msg.format(line='?', reason=ex))
+    return Markdown('File loaded!')
+        
+    return HTML('Loaded!') #HTML('<code>{}</code>'.format(cell))
+
+
+def extract_data(url):
+    g = Graph()
+    try:
+        g.parse(url, format='rdfa')
+    except Exception:
+        print('Could not get rdfa data', file=sys.stderr)
+    try:
+        g.parse(url, format='microdata')
+    except Exception:
+        print('Could not get microdata', file=sys.stderr)
+
+
+    def sanitize_triple(t):
+        """Function to remove bad URIs from the graph that would otherwise
+        make the serialization fail."""
+        def sanitize_triple_item(item):
+            if isinstance(item, term.URIRef) and ' ' in item:
+                return term.URIRef(parse.quote(str(item)))
+            return item
+
+        return (sanitize_triple_item(t[0]),
+                sanitize_triple_item(t[1]),
+                sanitize_triple_item(t[2]))
+
+
+    with request.urlopen(url) as response:
+        # Get all json-ld objects embedded in the html file
+        html = response.read().decode('utf-8', errors='ignore')
+        parser = etree.XMLParser(recover=True)
+        root = etree.fromstring(html.encode(), parser=parser)
+        if root is not None and len(root):
+            for jsonld in root.findall(".//script[@type='application/ld+json']"):
+                g.parse(data=jsonld.text, publicID=BNode(), format='json-ld')
+
+
+    fixedgraph = Graph()
+    fixedgraph += [sanitize_triple(s) for s in g]
+
+#     print(g.serialize(format='turtle').decode('utf-8', errors='ignore'))
+    return fixedgraph
+
+def turtle(g):
+    return Markdown('''
+Results:
+
+```turtle
+{}
+```
+'''.format(g.serialize(format='turtle').decode('utf-8', errors='ignore')))
+
+def print_data(url):
+    g = extract_data(url)
+    return turtle(g)
+
+    
+
+def test(description, got, expected=None, func=None):
+    if isinstance(got, types.GeneratorType):
+        got = set(got)
+    try:
+        if expected is None:
+            func = func or operator.truth
+            expected = True
+            assert func(got)
+        else:
+            func = func or operator.eq
+            assert func(got, expected)
+    except AssertionError:
+        print('Test failed: {}'.format(description), file=sys.stderr)
+        print('\tExpected: {}'.format(expected), file=sys.stderr)
+        print('\tGot:      {}'.format(got), file=sys.stderr)
+        raise Exception('Test failed: {}'.format(description))
+
+        
+def atLeast(lst, number):
+    return len(set(lst))>=number
+
+def containsAll(lst, other):
+    for i in other:
+        if i not in lst:
+            print('{} not found'.format(i), file=sys.stderr)
+            return False
+    return True
--- a/rdf/requirements.txt
+++ b/rdf/requirements.txt