1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-24 07:22:29 +00:00
sitc/nlp/4_7_Exercises.ipynb
Carlos A. Iglesias 842b6307f1 Updated notebooks
2019-03-06 17:46:12 +01:00

162 lines
3.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"\n",
"* [Exercises](#Exercises)\n",
"\t* [Exercise 1 - Sentiment classification for Twitter](#Exercise-1---Sentiment-classification-for-Twitter)\n",
"\t* [Exercise 2 - Spam classification](#Exercise-2---Spam-classification)\n",
"\t* [Exercise 3 - Automatic essay classification](#Exercise-3---Automatic-essay-classification)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we propose several exercises, it is recommended to work only in one of them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1 - Sentiment classification for Twitter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The purpose of this exercise is:\n",
"* Collect geolocated tweets\n",
"* Analyse their sentiment\n",
"* Represent the result in a map, so that one can understand the sentiment in a geographic region.\n",
"\n",
"The steps (and most of the code) can be found [here](http://pybonacci.org/2015/11/24/como-hacer-analisis-de-sentimiento-en-espanol-2/). \n",
"\n",
"You can select the tweets in any language."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2 - Spam classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The classification of spam is a classical problem. [Here](http://zacstewart.com/2015/04/28/document-classification-with-scikit-learn.html) you can find a detailed example of how to do it using the datasets Enron-Spama and SpamAssassin. You can try to test yourself the classification."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3 - Automatic essay classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you have seen, we did not got great results in the previous notebook. You can try to improve them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}