sitc/rdf/RDF.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "1fba29f718bbaa14890b305223712474",
     "grade": false,
     "grade_id": "cell-2bd9e19ffed99f81",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "<header style=\"width:100%;position:relative\">\n",
    "  <div style=\"width:80%;float:right;\">\n",
    "    <h1>Course Notes for Learning Intelligent Systems</h1>\n",
    "    <h3>Department of Telematic Engineering Systems</h3>\n",
    "    <h5>Universidad Politécnica de Madrid</h5>\n",
    "  </div>\n",
    "        <img style=\"width:15%;\" src=\"../logo.jpg\" alt=\"UPM\" />\n",
    "</header>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "59c5cb46c9d722f691206e766e5af557",
     "grade": false,
     "grade_id": "cell-51338a0933103db9",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# Introduction\n",
    "\n",
    "The goal of this exercise is to understand the usefulness of semantic annotation and the Linked Open Data initiative, by solving a practical use case.\n",
    "\n",
    "The student will achieve the goal through:\n",
    "\n",
    "* Analyzing the sequence of tasks required to generate and publish semantic data\n",
    "* Extending their knowledge using the set of additional documents and specifications\n",
    "* Creating a partial semantic definition using the Turtle format\n",
    "\n",
    "\n",
    "# Objectives\n",
    "\n",
    "The main objective is to learn how annotations can be unified on the web, by following the Linked Data principles.\n",
    "\n",
    "\n",
    "These concepts will be applied in a practical use case: obtaining a Graph of information about hotels and reviews about them.\n",
    "\n",
    "\n",
    "# Tools\n",
    "\n",
    "This notebook is self-contained, but it requires some python libraries.\n",
    "To install them, simply run the following line"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "6455a9642f93288f6c74b88d0892c4c7",
     "grade": false,
     "grade_id": "cell-d7f1ea9c021693b8",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "# Install a pip package in the current Jupyter kernel\n",
    "import sys\n",
    "!{sys.executable} -m pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "5c2cccdf6463262a0f8bc89c64f97fb2",
     "grade": false,
     "grade_id": "cell-d8db3c16cee92ac1",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# Linked Data, RDF and Turtle\n",
    "\n",
    "\n",
    "The term [Linked Data](https://www.w3.org/wiki/LinkedData) refers to a set of best practices for publishing structured data on the Web.\n",
    "These principles have been coined by Tim Berners-Lee in the design issue note Linked Data.\n",
    "The principles are:\n",
    "\n",
    "1. Use URIs as names for things\n",
    "2. Use HTTP URIs so that people can look up those names\n",
    "3. When someone looks up a URI, provide useful information\n",
    "4. Include links to other URIs, so that they can discover more things\n",
    "\n",
    "The [RDF](https://www.w3.org/RDF/) is a standard model for data interchange on the Web.\n",
    "It formalizes some concepts behind Linked Data into a specification, which can be used to develop applications and store information.\n",
    "\n",
    "Explaining RDF is out of the scope of this notebook.\n",
    "The [resources section](#Useful-resources) contains some links if you wish to learn about RDF.\n",
    "\n",
    "The main idea behind RDF is that information is encoded in the form of triples:\n",
    "\n",
    "```turtle\n",
    "<subject> <predicate> <object>\n",
    "```\n",
    "\n",
    "Each of these, (`<subject>`, `<predicate>` and `<object>`) should be unique identifiers.\n",
    "\n",
    "For example, to say Timmy is a 7 year-old whose dog is Tobby, we would write:\n",
    "\n",
    "```turtle\n",
    "<http://example.org/Timmy>  <http://example.org/hasDog> <http://example.org/Tobby>\n",
    "<http://example.org/Timmy>  <http://example.org/age> 7\n",
    "```\n",
    "\n",
    "Note that we are not referring to \"any Timmy\", but to a *very specific* Timmy.\n",
    "We could learn more about this particular boy using that URI.\n",
    "The same goes for the dog, and for the concept of \"having a dog\", which we unambiguously encode as `<http://example.org/hasDog>`.\n",
    "This concept may be described as taking care of a dog, for example, whereas a different property `<http://yourwebsite.com/hasDog>` could be described as being the legal owner of the dog.\n",
    "\n",
    "\n",
    "RDF can be used to embed annotation in many places, including HTML document, using any compatible format.\n",
    "The options include including RDFa, XML, JSON-LD and [Turtle](https://www.w3.org/TR/turtle/).\n",
    "\n",
    "\n",
    "In the exercises, we will be using turtle notation, because it is very readable.\n",
    "\n",
    "Here's an example of document in Turtle, taken from the Turtle specification:\n",
    "\n",
    "```turtle\n",
    "@base <http://example.org/> .\n",
    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
    "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
    "@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
    "@prefix rel: <http://www.perceive.net/schemas/relationship/> .\n",
    "\n",
    "<#green-goblin>\n",
    "    rel:enemyOf <#spiderman> ;\n",
    "    a foaf:Person ;    # in the context of the Marvel universe\n",
    "    foaf:name \"Green Goblin\" .\n",
    "\n",
    "<#spiderman>\n",
    "    rel:enemyOf <#green-goblin> ;\n",
    "    a foaf:Person ;\n",
    "    foaf:name \"Spiderman\", \"Человек-паук\"@ru .\n",
    "```\n",
    "\n",
    "\n",
    "The second exercise will show you how to extract this information from any website.\n",
    "\n",
    "As you can observe in these examples, Turtle defines several ways to specify IRIs in a result. Please, consult the specification for further details. As an overview, IRIs can be:\n",
    " * *relative IRIs*: IRIs resolved relative to the current base IRI. Thus, you should define a base IRI (@base <http://example.org>) and then relative IRIs (i.e. <#spiderman>). The resulting IRI is <http://example.org/spiderman>.\n",
    " * *prefixed names*: a prefixed name (i.e. foaf:Person) is transformed into an IRI by concatenating the IRI of the prefix (@prefix foaf: <http://xmlns.com/foaf/0.1) and the local part of the prefixed name (i.e. Person). So, the resulting IRI is <http://xmlns.com/foaf/0.1/Person\n",
    " * *absolute IRIs*: an already resolved IRI, p.ej. <http://example.com/Auto>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "d2849a6154d4807b405e6ec84601c231",
     "grade": false,
     "grade_id": "cell-14e2327285737802",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# Vocabularies and schema.org\n",
    "\n",
    "Concepts (predicates, types, etc.) can be defined in vocabularies.\n",
    "These vocabularies can be reused in several applications.\n",
    "In the example above, we used the concept of person from an external vocabulary (`foaf:Person`, i.e. http://xmlns.com/foaf/0.1/Person).\n",
    "That way, we do not need to redefine the concept of Person in every application.\n",
    "There are several well known vocabularies, such as:\n",
    "\n",
    "* Dublin core, for metadata: http://dublincore.org/\n",
    "* FOAF (Friend-of-a-friend) for social networks: http://www.foaf-project.org/\n",
    "* SIOC for online communities: https://www.w3.org/Submission/sioc-spec/\n",
    "\n",
    "Using the same vocabularies also makes it easier to automatically process and classify information.\n",
    "\n",
    "\n",
    "That was the motivation behind Schema.org, a collaboration between Google, Microsoft, Yahoo and Yandex.\n",
    "They aim to provide schemas for structured data annotation of Web sites, e-mails, etc., which can be leveraged by search engines and other automated processes.\n",
    "\n",
    "They rely on RDF for representation, and provide a set of common vocabularies that can be shared by every web developer.\n",
    "\n",
    "\n",
    "There are thousands of properties in the schema.org vocabulary, and they offer a very comprehensive documentation.\n",
    "\n",
    "As an example, this is the documentation for hotels:\n",
    "\n",
    "* List of properties for the Hotel type: https://schema.org/Hotel\n",
    "* Documentation for hotels: https://schema.org/docs/hotels.html\n",
    "\n",
    "\n",
    "You can use the documentation to find properties (e.g. `checkinTime`), as well as the type of that property (e.g. `Datetime`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "fe9a246ba580c71385e9b83d414a1216",
     "grade": false,
     "grade_id": "cell-a1b60daabb1a9d00",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "63879c425ec11742c95c728a578d109e",
     "grade": false,
     "grade_id": "cell-d9289e96b2b0f265",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Instructions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "e0b6464bce9263fb35543acf4acb31da",
     "grade": false,
     "grade_id": "cell-bb418e9bae1fef1a",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "First of all, run the line below.\n",
    "It will import everything you need for the exercises."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "bf98cea45f42e3d0f1ab158693b40da7",
     "grade": false,
     "grade_id": "cell-4a1b60bd9974bbb1",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "from helpers import *\n",
    "from rdflib import term, RDF, Namespace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "76c052d3f1117f07468068a90f969760",
     "grade": true,
     "grade_id": "cell-57f67d1e662b7f09",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "In these exercises, you have to fill in the parts marked:\n",
    "\n",
    "```\n",
    "# YOUR ANSWER HERE\n",
    "```\n",
    "\n",
    "Execute the following line. It simple fills 'example' with the code written below. It should produce the output 'File loaded!\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "69182e8fadb9c9751f76786e0fcb8803",
     "grade": false,
     "grade_id": "cell-808cfcbf3891f39f",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "%%ttl example\n",
    "\n",
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "f6eb0c6c19c1756d7705f52866b00f82",
     "grade": false,
     "grade_id": "cell-a4ed500079ba36ca",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "Now run the tests:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "7a21480d0282aed4f943d0c9d6ecd6e6",
     "grade": true,
     "grade_id": "cell-23e61b9f48d597fc",
     "locked": true,
     "points": 1,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "# This code tests that the definition above is correct.\n",
    "g = solution('example')\n",
    "test('Some triples have been loaded',\n",
    "     len(g))\n",
    "test('A person has been defined',\n",
    "     g.subjects(RDF.type, term.URIRef('http://xmlns.com/foaf/0.1/Person')))\n",
    "print('All tests passed. Well done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now replace #YOUR ANSWER HERE with your answer (the triples) and execute the test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%ttl example\n",
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code tests that the definition above is correct.\n",
    "g = solution('example')\n",
    "test('Some triples have been loaded',\n",
    "     len(g))\n",
    "test('A person has been defined',\n",
    "     g.subjects(RDF.type, term.URIRef('http://xmlns.com/foaf/0.1/Person')))\n",
    "print('All tests passed. Well done!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "61933311e76dd200b63032f42f26b11d",
     "grade": false,
     "grade_id": "cell-da88c2f8170436fe",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "Turtle is usually written in standalone files (e.g. `mydefinition.ttl`).\n",
    "To write Turtle definitions inside our notebook, and to make it easy to test them, we will use a special magic command: `%%ttl`.\n",
    "\n",
    "To make sure everything works, let's try first with an example.\n",
    "\n",
    "\n",
    "Copy/paste this solution below:\n",
    "\n",
    "```turtle\n",
    "@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n",
    "@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .\n",
    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
    "\n",
    "<http://purl.org/net/bsletten> \n",
    "    a foaf:Person;\n",
    "    foaf:interest <http://www.w3.org/2000/01/sw/>;\n",
    "    foaf:based_near [\n",
    "        geo:lat \"34.0736111\" ;\n",
    "        geo:lon \"-118.3994444\"\n",
    "   ] .\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "a64acf02625b48b3c65b6e1bc1ba6c1a",
     "grade": false,
     "grade_id": "cell-e73f1933742f7ab3",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Exercise 1: Definition of a Hotel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will define some basic information about a hotel, and some reviews.\n",
    "This should be the same type of information that some aggregators (e.g. TripAdvisor) offer in their websites.\n",
    "\n",
    "Namely, you need to define at least two hotels (you may add more than one), with the following information:\n",
    "* Description\n",
    "* Address\n",
    "* Contact information\n",
    "* City and country (location)\n",
    "* Email\n",
    "* logo\n",
    "* Opening hours\n",
    "* Price range\n",
    "* Amenities (optional)\n",
    "* Geolocation (optional)\n",
    "* Images (optional)\n",
    "\n",
    "You should also add at least three reviews about hotels, with the following information:\n",
    "* Name of the user that reviewed the Hotel\n",
    "* Rating\n",
    "* Date\n",
    "* Replies by other users (optional)\n",
    "* Aspects rated in each review (cleanliness, staff, etc...) (optional)\n",
    "* Information about the user (name, surname, date the account was created) (optional)\n",
    "\n",
    "\n",
    "You can check any hotel website for inspiration, like this [review of a hotel in TripAdvisor](https://www.tripadvisor.es/Hotel_Review-g1437655-d1088667-Reviews-Hotel_Spa_La_Salve-Torrijos_Province_of_Toledo_Castile_La_Mancha.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make sure we are following Principles 1 and 2, we should use URIs that can be queried.\n",
    "For the sake of this exercise, you have use the made-up `http://example/sitc/` as base for our URIs.\n",
    "Hence, the URIs of our hotels will look like this: `http://example/sitc/my-fancy-hotel`.\n",
    "These URIs can not be queried, **and should not be used in real annotations**, but we will see how to fix that in a future exercise.\n",
    "\n",
    "We will use the vocabularies defined in https://schema.org e.g.:\n",
    "    * https://schema.org/Review defines properties about reviews\n",
    "    * https://schema.org/Hotel defines properties about hotels\n",
    "    \n",
    "\n",
    "Your definition has to be included in the following cell.\n",
    "\n",
    "So, your task is:\n",
    "* Search the relevant properties of the vocabulary schema.org to represent the attributes of both reviews and hotels.\n",
    "* Write two resources of type Hotel and three resources of type Review.\n",
    "* Check that your syntax is correct, by executing your code in the cell below.\n",
    "\n",
    "**Tip**: Define the schema prefix first, to avoid repeating `<http://schema.org/...>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "44f8be14db3d3e42b5b85f0485206346",
     "grade": false,
     "grade_id": "definition",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "%%ttl hotel\n",
    "\n",
    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n",
    "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
    "@prefix sitc: <http://example/sitc/> .\n",
    "\n",
    "\n",
    "<http://example/sitc/GSIHOTEL> a <http://schema.org/Hotel> ;\n",
    "         <http://schema.org/description> \"This is just an example to get you started.\" .\n",
    "\n",
    "\n",
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "4f54963163a64f46058c86be139e5543",
     "grade": true,
     "grade_id": "definition-tests",
     "locked": true,
     "points": 10,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "g = solution('hotel')\n",
    "test('Some triples are loaded',\n",
    "     len(g))\n",
    "\n",
    "hotels = set(g.subjects(RDF.type, schema['Hotel']))\n",
    "test('At least 2 hotels are loaded',\n",
    "     hotels,\n",
    "     2,\n",
    "     atLeast)\n",
    "\n",
    "for hotel in hotels:\n",
    "    if 'GSIHOTEL' in hotel:  # Do not check the example hotel\n",
    "        continue\n",
    "    props = g.predicates(hotel)\n",
    "    test('Each hotel has all required properties',\n",
    "         props,\n",
    "         list(schema[i] for i in ['description', 'email', 'logo', 'priceRange']),\n",
    "         func=containsAll)\n",
    "\n",
    "reviews = set(g.subjects(RDF.type, schema['Review']))\n",
    "test('At least 3 reviews are loaded',\n",
    "     reviews,\n",
    "     3,\n",
    "     atLeast)\n",
    "\n",
    "for review in reviews:\n",
    "    props = g.predicates(review)\n",
    "    test('Each review has all required properties',\n",
    "         props,\n",
    "         list(schema[i] for i in ['itemReviewed', 'reviewBody', 'reviewRating']),\n",
    "         func=containsAll)\n",
    "    ratings = list(g.objects(review, schema['reviewRating']))\n",
    "    for rating in ratings:\n",
    "        value = g.value(rating, schema['ratingValue'])\n",
    "        test('The review should have ratings', value)\n",
    "\n",
    "authors = set(g.objects(None, schema['author']))\n",
    "for author in authors:\n",
    "    for prop in g.predicates(author, None):\n",
    "        if 'name' in str(prop).lower():\n",
    "            break\n",
    "else:\n",
    "    assert \"At least a reviewer has a name (surname, givenName...)\"\n",
    "\n",
    "print('All tests passed. Congratulations!')\n",
    "print()\n",
    "print('Now you can try to add the optional properties')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Explore existing data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this exercise is to explore and compare annotations from existing websites.\n",
    "\n",
    "Semantic annotations are very useful on the web, because they allow `robots` to extract information about resources, and how they relate to other resources.\n",
    "\n",
    "For example, `schema.org` annotations on a website allow Google to show summaries and useful information (e.g. price and location of a hotel) in their results.\n",
    "A similar technology powers their knowledge graph and the \"related search\". i.e. when you look for a famous actor, it will first show you their filmography, and a list of related actors.\n",
    "\n",
    "The information has to be provided using the official standards (RDF), to comply with the 3rd principle of linked data.\n",
    "\n",
    "To follow the 4<sup>th</sup> principle of linked data, the annotations should include links to known sources (e.g. DBpedia) whenever possible."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us explore some semantic annotations from popular websites.\n",
    "\n",
    "First, start with hotel reviews and websites. Here are some examples:\n",
    "\n",
    "* TripAdvisor hotels\n",
    "* Trivago\n",
    "* Kayak\n",
    "* Specific hotel reviews\n",
    "\n",
    "\n",
    "These are just two examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_data('http://www.hotellasalve.com/')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_data('https://www.mandarinoriental.com/madrid/hotel-ritz/luxury-hotel')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "c7af0b9af5a64773785cc890f2431c78",
     "grade": true,
     "grade_id": "cell-c2e5b58ea74e8276",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "# Try new sites here\n",
    "\n",
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "a29112f51cc3299c7cae27841feb7410",
     "grade": false,
     "grade_id": "cell-9bf9c7d7516fae75",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "Once you've extracted and analyzed different sources, answer the following questions:\n",
    "\n",
    "\n",
    "### Questions:\n",
    "\n",
    "What type of data do they offer?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "56a77e133b532997723bf2f8116389e4",
     "grade": true,
     "grade_id": "cell-17508ecf96884653",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "9311bca044d7057c86dd753f5343e19b",
     "grade": false,
     "grade_id": "cell-d36826d6323c96e8",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "What vocabularies and ontologies do they use?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "ba7b24b557d627e2665ca31c75c24c23",
     "grade": true,
     "grade_id": "cell-17508ecf96884655",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "853651c95cbcd69cd5f495f03d29d19a",
     "grade": false,
     "grade_id": "cell-e25a0db3fe8a6b4b",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "What properties and annotations do they have in common?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "779f0f750508eb52b2d98b92689e426b",
     "grade": true,
     "grade_id": "cell-30797c9ac87cc7e1",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "f8fff644855ca50a5219598322aa9b32",
     "grade": false,
     "grade_id": "cell-33862c8e38173d9c",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "What sites provide the most, or the most useful annotations?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "8ef0ebd54eefb44ff7019f17f58be3ec",
     "grade": true,
     "grade_id": "cell-17508ecf96884657",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "33e1ec78415c85a795e86211d88316c2",
     "grade": false,
     "grade_id": "cell-5f922dc14ad3236a",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "Are all properties from Exercise 1 given by the websites? What's missing?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "b5f208a95a8803e97f82c5f2cdf319dd",
     "grade": true,
     "grade_id": "answer-missing",
     "locked": false,
     "points": 1,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "26eb04e562aa6c7d29efa8318982a337",
     "grade": false,
     "grade_id": "cell-7a3c1553c4d6a9b7",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Optional\n",
    "\n",
    "There is nothing special about review sites.\n",
    "You can get information about any website.\n",
    "\n",
    "Verify this running checking:\n",
    "\n",
    "* News sites: e.g. https://edition.cnn.com/\n",
    "* CMS: e.g. http://www.etsit.upm.es\n",
    "* Twitter profiles: e.g. https://www.twitter.com/cif\n",
    "* Mastodon (a Twitter alternative) profiles: e.g. https://mastodon.social/@Gargron/\n",
    "* Twitter status pages: e.g. http://mobile.twitter.com/TBLInternetBot/status/1054438951237312514\n",
    "* Mastodon (a Twitter alternative) status pages: e.g. https://mastodon.social/@Gargron/101202440923902326\n",
    "* Wikipedia entries: e.g. https://es.wikipedia.org/wiki/Tim_Berners-Lee\n",
    "* Facebook groups: e.g. https://www.facebook.com/universidadpolitecnicademadrid/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_data('https://mastodon.social/@Gargron')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "fb8bd92e931c7836bb3c22dadd3be583",
     "grade": true,
     "grade_id": "cell-ff2413f45311f086",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "# Try some new sites here.\n",
    "# YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "cffc12120c51a7d994063f66d788570a",
     "grade": false,
     "grade_id": "cell-ec8df1a53c3d3f23",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# Useful resources\n",
    "\n",
    "* TTL validator: http://ttl.summerofcode.be/\n",
    "* RDF-turtle specification: https://www.w3.org/TR/turtle/\n",
    "* Schema.org documentation: https://schema.org\n",
    "* Wikipedia entry on the Turtle syntax: https://en.wikipedia.org/wiki/Turtle_(syntax)\n",
    "* RDFLib, the most popular python library for RDF (we use it in the tests): https://rdflib.readthedocs.io/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bibliography\n",
    "\n",
    "* W3C website on Linked Data: https://www.w3.org/wiki/LinkedData\n",
    "* W3C website on RDF: https://www.w3.org/RDF/\n",
    "* Turtle W3C recommendation: https://www.w3.org/TR/turtle/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}