1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-05-16 06:59:04 +00:00

Update 0_1_NLP_Slides.ipynb

This commit is contained in:
Carlos A. Iglesias 2025-04-24 18:30:18 +02:00 committed by GitHub
parent 36d117e417
commit 8f2a5c17d8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -89,7 +89,7 @@
} }
}, },
"source": [ "source": [
"In this session we are going to learn to process text so that can apply machine learning techniques." "In this session, we are going to learn to process text so that we can apply machine learning techniques."
] ]
}, },
{ {
@ -101,7 +101,7 @@
}, },
"source": [ "source": [
"# NLP Basics\n", "# NLP Basics\n",
"In this notebook we are going to use two popular NLP libraries:\n", "In this notebook, we are going to use two popular NLP libraries:\n",
"* NLTK (Natural Language Toolkit, https://www.nltk.org/) \n", "* NLTK (Natural Language Toolkit, https://www.nltk.org/) \n",
"* Spacy (https://spacy.io/)" "* Spacy (https://spacy.io/)"
] ]
@ -116,7 +116,7 @@
"source": [ "source": [
"Main characteristics:\n", "Main characteristics:\n",
"* both are open source and very popular\n", "* both are open source and very popular\n",
"* NLTK was released in 2001 while Spacy was in 2015\n", "* NLTK was released in 2001, while Spacy was in 2015\n",
"* Spacy provides very efficient implementations" "* Spacy provides very efficient implementations"
] ]
}, },
@ -130,7 +130,7 @@
"source": [ "source": [
"# Spacy installation\n", "# Spacy installation\n",
"\n", "\n",
"You need to install previously spacy if not installed:\n", "You need to install spacy if not installed:\n",
"* `pip install spacy`\n", "* `pip install spacy`\n",
"* or `conda install -c conda-forge spacy`\n", "* or `conda install -c conda-forge spacy`\n",
"\n", "\n",
@ -148,7 +148,7 @@
"source": [ "source": [
"# Spacy pipelines\n", "# Spacy pipelines\n",
"\n", "\n",
"The function **nlp** takes a raw text and perform several operations (tokenization, tagger, NER, ...)\n", "The function **nlp** takes a raw text and performs several operations (tokenization, tagger, NER, ...)\n",
"![](spacy/spacy-pipeline.svg \"Spacy pipelines\")" "![](spacy/spacy-pipeline.svg \"Spacy pipelines\")"
] ]
}, },
@ -160,7 +160,7 @@
} }
}, },
"source": [ "source": [
"From text to doc trough the pipeline" "From text to doc through the pipeline"
] ]
}, },
{ {
@ -205,7 +205,7 @@
"\n", "\n",
"* **Tokenizer exception:** Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied.\n", "* **Tokenizer exception:** Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied.\n",
"* **Prefix:** Character(s) at the beginning, e.g. $, (, “, ¿.\n", "* **Prefix:** Character(s) at the beginning, e.g. $, (, “, ¿.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ), ”, !.\n", "* **Suffix:** Character(s) at the end, e.g. km, ”, !.\n",
"* **Infix:** Character(s) in between, e.g. -, --, /, …." "* **Infix:** Character(s) in between, e.g. -, --, /, …."
] ]
}, },