1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-06-12 11:22:20 +00:00

Compare commits

...

2 Commits

Author SHA1 Message Date
Carlos A. Iglesias
6e8448f22f
Update 0_2_NLP_Assignment.ipynb 2025-04-24 18:31:56 +02:00
Carlos A. Iglesias
8f2a5c17d8
Update 0_1_NLP_Slides.ipynb 2025-04-24 18:30:18 +02:00
2 changed files with 14 additions and 14 deletions

View File

@ -89,7 +89,7 @@
}
},
"source": [
"In this session we are going to learn to process text so that can apply machine learning techniques."
"In this session, we are going to learn to process text so that we can apply machine learning techniques."
]
},
{
@ -101,7 +101,7 @@
},
"source": [
"# NLP Basics\n",
"In this notebook we are going to use two popular NLP libraries:\n",
"In this notebook, we are going to use two popular NLP libraries:\n",
"* NLTK (Natural Language Toolkit, https://www.nltk.org/) \n",
"* Spacy (https://spacy.io/)"
]
@ -116,7 +116,7 @@
"source": [
"Main characteristics:\n",
"* both are open source and very popular\n",
"* NLTK was released in 2001 while Spacy was in 2015\n",
"* NLTK was released in 2001, while Spacy was in 2015\n",
"* Spacy provides very efficient implementations"
]
},
@ -130,7 +130,7 @@
"source": [
"# Spacy installation\n",
"\n",
"You need to install previously spacy if not installed:\n",
"You need to install spacy if not installed:\n",
"* `pip install spacy`\n",
"* or `conda install -c conda-forge spacy`\n",
"\n",
@ -148,7 +148,7 @@
"source": [
"# Spacy pipelines\n",
"\n",
"The function **nlp** takes a raw text and perform several operations (tokenization, tagger, NER, ...)\n",
"The function **nlp** takes a raw text and performs several operations (tokenization, tagger, NER, ...)\n",
"![](spacy/spacy-pipeline.svg \"Spacy pipelines\")"
]
},
@ -160,7 +160,7 @@
}
},
"source": [
"From text to doc trough the pipeline"
"From text to doc through the pipeline"
]
},
{
@ -205,7 +205,7 @@
"\n",
"* **Tokenizer exception:** Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied.\n",
"* **Prefix:** Character(s) at the beginning, e.g. $, (, “, ¿.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ), ”, !.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ”, !.\n",
"* **Infix:** Character(s) in between, e.g. -, --, /, …."
]
},

View File

@ -82,7 +82,7 @@
}
},
"source": [
"### 1. List the first 10 tokens of the doc"
"### 1. List the first 10 tokens of the doc."
]
},
{
@ -149,7 +149,7 @@
}
},
"source": [
"### 7. Visualize the dependency grammar analysis of the second sentence"
"### 7. Visualize the dependency grammar analysis of the second sentence."
]
},
{
@ -178,7 +178,7 @@
}
},
"source": [
"### 9. List frequencies of POS in the document in a table "
"### 9. List the frequencies of POS in the document in a table."
]
},
{
@ -191,7 +191,7 @@
"source": [
"### 10. Preprocessing\n",
"\n",
"Remove from the doc stopwords, digits and punctuation.\n",
"Remove from the doc stopwords, digits, and punctuation.\n",
"\n",
"Hint: check the token api https://spacy.io/api/token\n",
"\n",
@ -207,7 +207,7 @@
},
"source": [
"### 11. Entities of the document\n",
"Print the entities of the document, the type of the entity and what the explanation of the entity in a table with three columns.\n",
"Print the entities of the document, the type of the entity, and the explanation of the entity in a table with three columns.\n",
"\n",
"Example:\n",
"\n",
@ -223,7 +223,7 @@
},
"source": [
"### 12. Visualize the entities\n",
"Show the entities in a graph."
"Show the entities highlighted in the text."
]
},
{
@ -236,7 +236,7 @@
"source": [
"# Movie review\n",
"\n",
"Classify the rmoview reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
"Classify the movie reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
]
},
{