![](images/EscUpmPolit_p.gif "UPM")

# Course Notes for Learning Intelligent Systems

Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias

# Table of Contents
* [Objectives](#Objectives)
* [NLP Basics](#NLP-Basics)
 * [Spacy installation](#Spacy-installation)
 * [Spacy pipeline](#Spacy-pipeline)
 * [Tokenization](#Tokenization)
 * [Noun chunks](#Noun-chunks)
 * [Stemming](#Stemming)
 * [Sentence segmentation](#Sentence-segmentation)
 * [Lemmatization](#Lemmatization)
 * [Stop words](#Stop-words)
 * [POS](#POS)
 * [NER](#NER)
* [Text Feature extraction](#Text-Feature-extraction)
* [Classifying spam](#Classifying-spam)
* [Vectors and similarity](#Vectors-and-similarity)

# Objectives

In this session we are going to learn the power of transformers.

# Transformers
As we saw, transformers are an extremely powerful architecture capable of performing many popular NLP tasks.

A well-known transformer model repository is available at https://huggingface.co/. 

Let's see how to use it. To go deeper, consult the Hugging tutorial (https://huggingface.co/learn/nlp-course/chapter1/1).

The transformers package requires to have installed Pytorch or TensorFlow. Check the installation details if you want to configure your environment well. For learning purposes, we are going to install Pytorch.


First of all, you should install Hugging Face. Execute:
* pip install torch transformers

In [1]:
!pip install torch transformers



## Usa cases: how to use pipeline

### Sentiment Analysis
Let's classify sentiments

In [2]:
from transformers import pipeline

from transformers import logging

logging.set_verbosity_error()
#logging.set_verbosity_warning()

model_sentiment = "cardiffnlp/twitter-roberta-base-sentiment-latest"

sentiment_pipe = pipeline("sentiment-analysis", model=model_sentiment)

print(sentiment_pipe("I love LLMs."))

#We pan 
print(sentiment_pipe(["I hate LLMs.", "I don't care about LLMs"]))

[{'label': 'positive', 'score': 0.974217414855957}]
[{'label': 'negative', 'score': 0.9310991168022156}, {'label': 'neutral', 'score': 0.5152537226676941}]


### Translation
Let's translate a sentence

In [3]:
from transformers import pipeline

#if no model is specified, it uses google-t5
translator_en_fr = pipeline("translation_en_to_fr")

print(translator_en_fr("This is the course of Natural Language Processing", max_length=40))

[{'translation_text': 'Il s’agit du cours de traitement des langues naturelles'}]


### Conversation
Let's create a chatbot

In [4]:
from transformers import pipeline, Conversation

chatbot = pipeline(task = "conversational", model = "facebook/blenderbot-400M-distill")

conversation = Conversation("Hi, I'm Peter, how are you?")
conversation = chatbot(conversation)
print(conversation)

conversation = Conversation("Can i have a lunch with you?")
conversation = chatbot(conversation)
print(conversation)

conversation = Conversation('Do you like Paella?')
conversation = chatbot(conversation)
print(conversation)

conversation = Conversation('What do you know about Paella?')
conversation = chatbot(conversation)
print(conversation)

Conversation id: e0ab78fe-bdbf-4941-a623-e4b8705e9e2e
user: Hi, I'm Peter, how are you?
assistant: I'm doing well. How are you doing this evening? I just got home from work.

Conversation id: 20180c79-dd50-4541-a432-c92378b8bc31
user: Can i have a lunch with you?
assistant: Sure, what do you want to eat? I'll make you a sandwich. I love sandwiches.

Conversation id: a7b3d314-6ff0-4756-8a0b-25806c9281a5
user: Do you like Paella?
assistant: I love it! I make it at least once a week. It's one of my favorite dishes.

Conversation id: c9b6dfd0-10a6-4d5e-abfc-7f090119e63a
user: What do you know about Paella?
assistant: I know that it is a traditional Italian dish consisting of rice and meatballs.



### Masked word completion
Generate words for a mask

In [5]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello, Im am eating at a [MASK].")

[{'score': 0.32434332370758057,
 'token': 2795,
 'token_str': 'table',
 'sequence': 'hello, im am eating at a table.'},
 {'score': 0.3150143623352051,
 'token': 4825,
 'token_str': 'restaurant',
 'sequence': 'hello, im am eating at a restaurant.'},
 {'score': 0.07178690284490585,
 'token': 3347,
 'token_str': 'bar',
 'sequence': 'hello, im am eating at a bar.'},
 {'score': 0.04275984689593315,
 'token': 15736,
 'token_str': 'diner',
 'sequence': 'hello, im am eating at a diner.'},
 {'score': 0.032276701182127,
 'token': 28305,
 'token_str': 'buffet',
 'sequence': 'hello, im am eating at a buffet.'}]

## Ner
Let's detect NER

In [17]:
from transformers import pipeline

ner = pipeline("ner", aggregation_strategy="simple")
ner("Peter has studied at Universidad Politécnica de Madrid in Madrid, Spain")

[{'entity_group': 'PER',
 'score': 0.9992756,
 'word': 'Peter',
 'start': 0,
 'end': 5},
 {'entity_group': 'ORG',
 'score': 0.98043567,
 'word': 'Universidad Politécnica de Madrid',
 'start': 21,
 'end': 54},
 {'entity_group': 'LOC',
 'score': 0.9985493,
 'word': 'Madrid',
 'start': 58,
 'end': 64},
 {'entity_group': 'LOC',
 'score': 0.99971014,
 'word': 'Spain',
 'start': 66,
 'end': 71}]

### Summarization
Let's generate a summary.

In [9]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

article = """
Europe’s climate chief has warned against politicians trying to use the climate crisis as a wedge issue in the forthcoming EU parliament elections, calling instead for climate policy that will bring wider economic benefits.
Wopke Hoekstra, the EU commissioner for climate action, said Europe had no choice but to press ahead with strong measures to cut greenhouse gases, whoever was in power, but added that more attention was needed to help businesses thrive in a low-carbon world.
He said: “There is no alternative than to continue with climate action. We need to continue in the direction of travel we have set. We need to speed up our pace.”
Rightwing parties are forecast in polls to do well in the election, to be held from 6 to 9 June, largely at the expense of the Greens and socialist parties. Protests by farmers in EU capitals have attacked climate policies, and some rightwing parties have stepped up anti-green rhetoric.
"""
print(summarizer(article, max_length=130, min_length=30, do_sample=False)) 

[{'summary_text': 'Wopke Hoekstra, the EU commissioner for climate action, said Europe had no choice but to press ahead with strong measures to cut greenhouse gases. He said more attention was needed to help businesses thrive in a low-carbon world.'}]


### Zero-shot classification
Classification without examples!

In [10]:
from transformers import pipeline
classifier = pipeline('zero-shot-classification', model='roberta-large-mnli')

sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'cooking', 'dancing'],
 'scores': [0.979964017868042, 0.010604988783597946, 0.00943098682910204]}

In [11]:
sequence_to_classify = "The CEO had a strong handshake."
candidate_labels = ['male', 'female']
hypothesis_template = "This text speaks about a {} profession."
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)

{'sequence': 'The CEO had a strong handshake.',
 'labels': ['male', 'female'],
 'scores': [0.8384838104248047, 0.16151617467403412]}

In [12]:
sentences = ["Nadal has won the last match", "There is an election in Bulgaria", "The oil price is very high", "The new film by Almodovar has been just released"]
candidate_labels = ['sport', 'politics', 'culture', 'economics']
classifier(sentences, candidate_labels)

[{'sequence': 'Nadal has won the last match',
 'labels': ['sport', 'culture', 'economics', 'politics'],
 'scores': [0.8608443140983582,
 0.07932569086551666,
 0.03197338432073593,
 0.027856575325131416]},
 {'sequence': 'There is an election in Bulgaria',
 'labels': ['politics', 'culture', 'economics', 'sport'],
 'scores': [0.962326169013977,
 0.01514720730483532,
 0.012851395644247532,
 0.009675216861069202]},
 {'sequence': 'The oil price is very high',
 'labels': ['economics', 'culture', 'politics', 'sport'],
 'scores': [0.8462415933609009,
 0.06119668856263161,
 0.04652851074934006,
 0.04603322222828865]},
 {'sequence': 'The new film by Almodovar has been just released',
 'labels': ['culture', 'politics', 'sport', 'economics'],
 'scores': [0.711652934551239,
 0.12886476516723633,
 0.10017038881778717,
 0.0593118779361248]}]

### Text generation
Let's generate

In [18]:
from transformers import pipeline

generation = pipeline("text-generation")

generation("This articles aims evaluating transformers' capabilities")


[{'generated_text': "This articles aims evaluating transformers' capabilities, effectiveness and cost of use."}]

### Question-Answering
Let's create a QA!

In [28]:
from transformers import pipeline

qa = pipeline("question-answering")

qa(
 question = "Where was born Penelope Cruz?",
 context = '''
 Cruz was born on April 28, 1974, in Alcobendas, Madrid, Spain. 
 In July 2010, Cruz married her Vicky Cristina Barcelona co-star, 
 Spanish actor Javier Bardem. The couple had begun dating early into filming, in 2007.
 '''
)

{'score': 0.8235691785812378,
 'start': 52,
 'end': 77,
 'answer': 'Alcobendas, Madrid, Spain'}

In [32]:
qa(
 question = "Who is Penelope Cruz' husband?",
 context = '''
 Cruz was born on April 28, 1974, in Alcobendas, Madrid, Spain. 
 In July 2010, Cruz married her Vicky Cristina Barcelona co-star,
 Spanish actor Javier Bardem. The couple had begun dating early into filming, in 2007.
 '''
)

{'score': 0.4068852663040161,
 'start': 126,
 'end': 187,
 'answer': 'Vicky Cristina Barcelona co-star, Spanish actor Javier Bardem'}

### Text-to-Speech

In [16]:
from transformers import pipeline

pipe = pipeline("text-to-speech", model="suno/bark-small")
text = "[clears throat] This is a test ... and I just took a long pause."
output = pipe(text)

from IPython.display import Audio 

Audio(output["audio"], rate=output["sampling_rate"])

## References



* [Hugging Face Tutorial](https://huggingface.co/learn/nlp-course/chapter1/1) 

## Licence

The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). 

© Carlos A. Iglesias, Universidad Politécnica de Madrid.