Compare commits

...

5 Commits

Author SHA1 Message Date
Carlos A. Iglesias 419ea57824
Transparencias con Spacy 1 year ago
Carlos A. Iglesias 7d6010114d
Upload data for assignment 1 year ago
Carlos A. Iglesias f9d8234e14
Added exercise with Spacy 1 year ago
Carlos A. Iglesias d41fa61c65
Delete 0_2_NLP_Assignment.ipynb 1 year ago
Carlos A. Iglesias 05a4588acf
Exercise with Spacy 1 year ago

File diff suppressed because one or more lines are too long

@ -0,0 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Table of Contents\n",
"* [First steps](#First-steps)\n",
"* [Movie review](#Movie-review)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# First steps\n",
"Given the text taken from https://www.romania-insider.com/baneasa-airport-reopening-date-jul-2022.\n",
"\n",
"The Aurel Vlaicu Băneasa Airport will reopen on August 1, with scheduled commercial flights resuming after a nine-year hiatus, George Dorobanțu, the director of the Bucharest National Airports Company (CNAB), announced in an interview with the public radio. Three companies are already ready to start scheduled and charter flights on Băneasa, namely Ryanair, Air Connect, and Fly One, the director said.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"text = \"The Aurel Vlaicu Băneasa Airport will reopen on August 1, with scheduled commercial flights resuming after a nine-year hiatus, George Dorobanțu, the director of the Bucharest National Airports Company (CNAB), announced in an interview with the public radio. Three companies are already ready to start scheduled and charter flights on Băneasa, namely Ryanair, Air Connect, and Fly One, the director said.\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 1. List the first 10 tokens of the doc"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 2. Number of tokens of the text."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 3. List the Noun chunks\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 4. Print the sentences of the text"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 5. Print the number of sentences of the text\n",
"Hint: build a list first"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 6. Print the second sentence. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 7. Visualize the dependency grammar analysis of the second sentence"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 8. Listing lemmas and deps\n",
"For every token in the second sentence, print the text token, the grammatical category, and the lemma in four columns.\n",
"\n",
"Example:\n",
"\n",
"you  PRON  you  nsubj\n",
"\n",
"Hint: format the columns. You can use expandtabs."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 9. List frequencies of POS in the document in a table "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 10. Preprocessing\n",
"\n",
"Remove from the doc stopwords, digits and punctuation.\n",
"\n",
"Hint: check the token api https://spacy.io/api/token\n",
"\n",
"Print the number of tokens before and after preprocessing."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 11. Entities of the document\n",
"Print the entities of the document, the type of the entity and what the explanation of the entity in a table with three columns.\n",
"\n",
"Example:\n",
"\n",
"Ubuntu    ORG    Companies, agencies, institutions, etc."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 12. Visualize the entities\n",
"Show the entities in a graph."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Movie review\n",
"\n",
"Classify the rmoview reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## References\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"* [Spacy](https://spacy.io/usage/spacy-101/#annotations) \n",
"* [NLTK stemmer](https://www.nltk.org/howto/stem.html)\n",
"* [NLTK Book. Natural Language Processing with Python. Steven Bird, Ewan Klein, and Edward Loper. O'Reilly Media, 2009 ](http://www.nltk.org/book_1ed/)\n",
"* [NLTK Essentials, Nitin Hardeniya, Packt Publishing, 2015](http://proquest.safaribooksonline.com/search?q=NLTK%20Essentials)\n",
"* Natural Language Processing with Python, José Portilla, 2019."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because it is too large Load Diff
Loading…
Cancel
Save