"Here we propose several exercises, it is recommended to work only in one of them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1 - Sentiment classification for Twitter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The purpose of this exercise is:\n",
"* Collect geolocated tweets\n",
"* Analyse their sentiment\n",
"* Represent the result in a map, so that one can understand the sentiment in a geographic region.\n",
"\n",
"The steps (and most of the code) can be found [here](http://pybonacci.org/2015/11/24/como-hacer-analisis-de-sentimiento-en-espanol-2/). \n",
"\n",
"You can select the tweets in any language."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2 - Spam classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The classification of spam is a classical problem. [Here](http://zacstewart.com/2015/04/28/document-classification-with-scikit-learn.html) you can find a detailed example of how to do it using the datasets Enron-Spama and SpamAssassin. You can try to test yourself the classification."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3 - Automatic essay classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you have seen, we did not got great results in the previous notebook. You can try to improve them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",