1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-06-14 04:02:20 +00:00

Compare commits

..

No commits in common. "6e8448f22f41eca9c614119af7e1532bd4b87ad0" and "36d117e41751e681083c0aa1949c64a80e8d1c5a" have entirely different histories.

2 changed files with 14 additions and 14 deletions

View File

@ -89,7 +89,7 @@
}
},
"source": [
"In this session, we are going to learn to process text so that we can apply machine learning techniques."
"In this session we are going to learn to process text so that can apply machine learning techniques."
]
},
{
@ -101,7 +101,7 @@
},
"source": [
"# NLP Basics\n",
"In this notebook, we are going to use two popular NLP libraries:\n",
"In this notebook we are going to use two popular NLP libraries:\n",
"* NLTK (Natural Language Toolkit, https://www.nltk.org/) \n",
"* Spacy (https://spacy.io/)"
]
@ -116,7 +116,7 @@
"source": [
"Main characteristics:\n",
"* both are open source and very popular\n",
"* NLTK was released in 2001, while Spacy was in 2015\n",
"* NLTK was released in 2001 while Spacy was in 2015\n",
"* Spacy provides very efficient implementations"
]
},
@ -130,7 +130,7 @@
"source": [
"# Spacy installation\n",
"\n",
"You need to install spacy if not installed:\n",
"You need to install previously spacy if not installed:\n",
"* `pip install spacy`\n",
"* or `conda install -c conda-forge spacy`\n",
"\n",
@ -148,7 +148,7 @@
"source": [
"# Spacy pipelines\n",
"\n",
"The function **nlp** takes a raw text and performs several operations (tokenization, tagger, NER, ...)\n",
"The function **nlp** takes a raw text and perform several operations (tokenization, tagger, NER, ...)\n",
"![](spacy/spacy-pipeline.svg \"Spacy pipelines\")"
]
},
@ -160,7 +160,7 @@
}
},
"source": [
"From text to doc through the pipeline"
"From text to doc trough the pipeline"
]
},
{
@ -205,7 +205,7 @@
"\n",
"* **Tokenizer exception:** Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied.\n",
"* **Prefix:** Character(s) at the beginning, e.g. $, (, “, ¿.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ”, !.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ), ”, !.\n",
"* **Infix:** Character(s) in between, e.g. -, --, /, …."
]
},

View File

@ -82,7 +82,7 @@
}
},
"source": [
"### 1. List the first 10 tokens of the doc."
"### 1. List the first 10 tokens of the doc"
]
},
{
@ -149,7 +149,7 @@
}
},
"source": [
"### 7. Visualize the dependency grammar analysis of the second sentence."
"### 7. Visualize the dependency grammar analysis of the second sentence"
]
},
{
@ -178,7 +178,7 @@
}
},
"source": [
"### 9. List the frequencies of POS in the document in a table."
"### 9. List frequencies of POS in the document in a table "
]
},
{
@ -191,7 +191,7 @@
"source": [
"### 10. Preprocessing\n",
"\n",
"Remove from the doc stopwords, digits, and punctuation.\n",
"Remove from the doc stopwords, digits and punctuation.\n",
"\n",
"Hint: check the token api https://spacy.io/api/token\n",
"\n",
@ -207,7 +207,7 @@
},
"source": [
"### 11. Entities of the document\n",
"Print the entities of the document, the type of the entity, and the explanation of the entity in a table with three columns.\n",
"Print the entities of the document, the type of the entity and what the explanation of the entity in a table with three columns.\n",
"\n",
"Example:\n",
"\n",
@ -223,7 +223,7 @@
},
"source": [
"### 12. Visualize the entities\n",
"Show the entities highlighted in the text."
"Show the entities in a graph."
]
},
{
@ -236,7 +236,7 @@
"source": [
"# Movie review\n",
"\n",
"Classify the movie reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
"Classify the rmoview reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
]
},
{