1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-09-13 10:22:20 +00:00

Compare commits

108 Commits

Author SHA1 Message Date
Carlos A. Iglesias
9844820e66 Delete xai/readme 2025-06-06 17:24:29 +03:00
Carlos A. Iglesias
d10434362e Add files via upload 2025-06-06 17:24:05 +03:00
Carlos A. Iglesias
fb2135cea6 Create readme 2025-06-06 17:23:37 +03:00
Carlos A. Iglesias
ba6e533e0b Add files via upload
XAI notebook
2025-06-06 17:23:16 +03:00
Carlos A. Iglesias
4f5e976918 Create readme 2025-06-06 17:22:33 +03:00
Carlos A. Iglesias
b58370a19a Update .gitignore 2025-06-02 17:23:44 +03:00
Carlos A. Iglesias
5c203b0884 Update spiral.py
Fixed typo
2025-06-02 17:22:55 +03:00
Carlos A. Iglesias
5bf815f60f Update 2_4_2_Exercise_Optional.ipynb
Changed image path
2025-06-02 17:22:16 +03:00
Carlos A. Iglesias
90a3ff098b Update 2_4_1_Exercise.ipynb
Changed image path
2025-06-02 17:21:25 +03:00
Carlos A. Iglesias
945a8a7fb6 Update 2_4_0_Intro_NN.ipynb
Changed image path
2025-06-02 17:19:19 +03:00
Carlos A. Iglesias
6532ef1b27 Update 2_8_Conclusions.ipynb
Changed image path
2025-06-02 17:18:31 +03:00
Carlos A. Iglesias
3a73b2b286 Update 2_7_Model_Persistence.ipynb
Changed image path
2025-06-02 17:17:43 +03:00
Carlos A. Iglesias
2e4ec3cfdc Update 2_6_Model_Tuning.ipynb 2025-06-02 17:16:53 +03:00
Carlos A. Iglesias
21e7ae2f57 Update 2_5_2_Decision_Tree_Model.ipynb
Changed image path
2025-06-02 17:13:49 +03:00
Carlos A. Iglesias
7b4d16964d Update 2_5_1_kNN_Model.ipynb
Changed image path
2025-06-02 17:11:45 +03:00
Carlos A. Iglesias
c5967746ea Update 2_5_0_Machine_Learning.ipynb 2025-06-02 17:09:42 +03:00
Carlos A. Iglesias
ed7f0f3e1c Update 2_5_0_Machine_Learning.ipynb 2025-06-02 17:09:13 +03:00
Carlos A. Iglesias
9324516c19 Update 2_5_0_Machine_Learning.ipynb
Changed image path
2025-06-02 17:08:03 +03:00
Carlos A. Iglesias
6fc5565ea0 Update 2_2_Read_Data.ipynb 2025-06-02 17:05:17 +03:00
Carlos A. Iglesias
1113485833 Add files via upload 2025-06-02 17:03:20 +03:00
Carlos A. Iglesias
0c3f317a85 Add files via upload 2025-06-02 17:02:46 +03:00
Carlos A. Iglesias
0b550c837b Update 2_2_Read_Data.ipynb
Added figures
2025-06-02 17:00:58 +03:00
Carlos A. Iglesias
d7ce6df7fe Update 2_2_Read_Data.ipynb 2025-06-02 16:57:54 +03:00
Carlos A. Iglesias
e2edae6049 Update 2_2_Read_Data.ipynb 2025-06-02 16:54:37 +03:00
Carlos A. Iglesias
4ea0146def Update 2_2_Read_Data.ipynb 2025-06-02 16:54:06 +03:00
Carlos A. Iglesias
e7b2cee795 Add files via upload 2025-06-02 16:31:20 +03:00
Carlos A. Iglesias
9e1d0e5534 Add files via upload 2025-06-02 16:30:13 +03:00
Carlos A. Iglesias
f82203f371 Update 2_4_Preprocessing.ipynb
Changed image path
2025-06-02 16:29:26 +03:00
Carlos A. Iglesias
b9ecccdeab Update 2_3_1_Advanced_Visualisation.ipynb 2025-06-02 16:28:06 +03:00
Carlos A. Iglesias
44a555ac2d Update 2_3_1_Advanced_Visualisation.ipynb
Changed image path
2025-06-02 16:09:55 +03:00
Carlos A. Iglesias
ec11ff2d5e Update 2_3_0_Visualisation.ipynb
Changed image path
2025-06-02 16:06:53 +03:00
Carlos A. Iglesias
ec02125396 Update 2_2_Read_Data.ipynb 2025-06-02 16:04:57 +03:00
Carlos A. Iglesias
b5f1a7dd22 Update 2_0_0_Intro_ML.ipynb 2025-06-02 16:03:03 +03:00
Carlos A. Iglesias
1cc1e45673 Update 2_2_Read_Data.ipynb
Changed image path
2025-06-02 16:02:45 +03:00
Carlos A. Iglesias
a2ad2c0e92 Update 2_1_Intro_ScikitLearn.ipynb
Changed images path
2025-06-02 16:00:59 +03:00
Carlos A. Iglesias
1add6a4c8e Update 2_0_1_Objectives.ipynb
Changed image path
2025-06-02 15:58:32 +03:00
Carlos A. Iglesias
af78e6480d Update 2_0_0_Intro_ML.ipynb
changed path to image
2025-06-02 15:57:25 +03:00
Carlos A. Iglesias
cae7d8cbb2 Updated LLM 2025-05-05 16:39:20 +02:00
Carlos A. Iglesias
f58aa6c0b8 Delete nlp/0_1_LLM.ipynb 2025-05-05 16:38:41 +02:00
Carlos A. Iglesias
6e8448f22f Update 0_2_NLP_Assignment.ipynb 2025-04-24 18:31:56 +02:00
Carlos A. Iglesias
8f2a5c17d8 Update 0_1_NLP_Slides.ipynb 2025-04-24 18:30:18 +02:00
Carlos A. Iglesias
36d117e417 Delete nlp/spacy/readme.md 2025-04-21 18:59:11 +02:00
Carlos A. Iglesias
2fc057f6f9 Add files via upload 2025-04-21 18:58:47 +02:00
Carlos A. Iglesias
5b0d4f2a5d Add files via upload 2025-04-21 18:58:15 +02:00
Carlos A. Iglesias
7afa2b3b22 Create readme.md 2025-04-21 18:57:59 +02:00
Carlos A. Iglesias
4e0f9159e8 Update 2_5_1_Exercise.ipynb 2025-04-03 18:54:52 +02:00
Carlos A. Iglesias
82aa552976 Update 2_5_1_Exercise.ipynb 2025-04-03 18:53:35 +02:00
Carlos A. Iglesias
3ebff69cf8 Update 2_5_1_Exercise.ipynb 2025-04-03 18:43:58 +02:00
Carlos A. Iglesias
0f228bbec3 Update 2_5_1_Exercise.ipynb 2025-04-03 18:43:34 +02:00
Carlos A. Iglesias
64c8854741 Update 2_5_1_Exercise.ipynb 2025-04-03 18:41:49 +02:00
Carlos A. Iglesias
3e081e5d83 Update 2_5_1_Exercise.ipynb 2025-04-03 18:38:26 +02:00
Carlos A. Iglesias
065797b886 Update 2_5_1_Exercise.ipynb 2025-04-03 18:37:26 +02:00
Carlos A. Iglesias
8d2f625b7e Update 2_5_1_Exercise.ipynb 2025-04-03 18:36:31 +02:00
Carlos A. Iglesias
26eda30a71 Update 2_5_1_Exercise.ipynb 2025-04-03 18:35:53 +02:00
Carlos A. Iglesias
55365ae927 Update 2_5_1_Exercise.ipynb 2025-04-03 18:34:50 +02:00
Carlos A. Iglesias
152125b3da Update 2_5_1_Exercise.ipynb 2025-04-03 18:33:47 +02:00
Carlos A. Iglesias
97362545ea Update 2_5_1_Exercise.ipynb
Added https://sklearn-genetic-opt.readthedocs.io/en/stable/index.html
2025-04-03 18:32:32 +02:00
cif
c49c866a2e Update notebook with pivot_table examples 2025-03-06 16:05:16 +01:00
Carlos A. Iglesias
3f7694e330 Add files via upload
Added ttl
2025-02-20 19:14:13 +01:00
Carlos A. Iglesias
bf684d6e6e Updated index 2024-06-07 17:54:18 +03:00
Carlos A. Iglesias
d935b85b26 Add files via upload
Added images
2024-06-03 14:44:28 +02:00
Carlos A. Iglesias
1d8e777236 Create .p 2024-06-03 15:42:13 +03:00
Carlos A. Iglesias
23ebe2f390 Update 3_1_Read_Data.ipynb
Updated table markdown
2024-05-21 14:30:26 +02:00
Carlos A. Iglesias
01eb89ada4 New noteboook about transformers 2024-05-14 09:55:02 +02:00
Carlos A. Iglesias
e4fdcd65a1 Update 2_6_1_Q-Learning_Basic.ipynb
Updated installation with new version of gymnasium
2024-04-24 18:46:54 +02:00
Carlos A. Iglesias
9f46c534f7 Update 2_5_1_Exercise.ipynb
Added optional exercises.
2024-04-18 18:04:43 +02:00
Carlos A. Iglesias
743c57691f Delete sna/t.txt 2024-04-17 17:24:12 +02:00
Carlos A. Iglesias
2c53b81299 Uploaded SNA files 2024-04-17 17:23:28 +02:00
Carlos A. Iglesias
dd6c053109 Add files via upload 2024-04-17 17:22:36 +02:00
Carlos A. Iglesias
e35e0a11e9 Create t.txt 2024-04-17 17:22:20 +02:00
Carlos A. Iglesias
7315b681e4 Update README.md 2024-04-17 17:21:21 +02:00
Carlos A. Iglesias
3fac9c6f78 Add files via upload 2024-04-04 18:27:48 +02:00
Carlos A. Iglesias
21819abeae Added visualization notebooks 2024-04-03 22:53:02 +02:00
Carlos A. Iglesias
0d4c0c706d Added images 2024-04-03 22:51:58 +02:00
Carlos A. Iglesias
8de629b495 Create .gitkeep 2024-04-03 22:51:19 +02:00
Carlos A. Iglesias
86114b4a56 Added preprocessing notebooks 2024-04-03 22:50:36 +02:00
Carlos A. Iglesias
1a3f618995 Add files via upload 2024-04-03 21:52:25 +02:00
Carlos A. Iglesias
a1121c03a5 Create .gitkeep - Added preprocessing notebooks 2024-04-03 21:51:34 +02:00
Carlos A. Iglesias
715d0cb77f Create .gitkeep
Added new set of exercises
2024-04-03 21:50:50 +02:00
Carlos A. Iglesias
0150ce7cf7 Update 3_7_SVM.ipynb
Updated formatted table
2024-02-22 12:23:08 +01:00
Carlos A. Iglesias
08dfe5c147 Update 3_4_Visualisation_Pandas.ipynb
Updated code to last version of seaborn
2024-02-22 11:55:35 +01:00
Carlos A. Iglesias
78e62af098 Update 3_3_Data_Munging_with_Pandas.ipynb
Updated to last version of scikit
2024-02-21 12:29:04 +01:00
Carlos A. Iglesias
3f5eba3e84 Update 3_2_Pandas.ipynb
Updated links
2024-02-21 12:16:12 +01:00
Carlos A. Iglesias
2de1cda8f1 Update 3_1_Read_Data.ipynb
Updated links
2024-02-21 12:14:25 +01:00
Carlos A. Iglesias
cc442c35f3 Update 3_0_0_Intro_ML_2.ipynb
Updated links
2024-02-21 12:12:14 +01:00
Carlos A. Iglesias
1100c352fa Update 2_6_Model_Tuning.ipynb
updated links
2024-02-21 11:47:34 +01:00
Carlos A. Iglesias
9b573d292d Update 2_5_2_Decision_Tree_Model.ipynb
Updated links
2024-02-21 11:41:42 +01:00
Carlos A. Iglesias
dd8a4f50d8 Update 2_5_2_Decision_Tree_Model.ipynb
Updated links
2024-02-21 11:40:59 +01:00
Carlos A. Iglesias
47148f2ccc Update util_ds.py
Updated links
2024-02-21 11:40:06 +01:00
Carlos A. Iglesias
8ffda8123a Update 2_5_1_kNN_Model.ipynb
Updated links
2024-02-21 11:07:38 +01:00
Carlos A. Iglesias
6629837e7d Update 2_5_0_Machine_Learning.ipynb
Updated links
2024-02-21 11:06:21 +01:00
Carlos A. Iglesias
ba08a9a264 Update 2_4_Preprocessing.ipynb
Updated links
2024-02-21 11:02:09 +01:00
Carlos A. Iglesias
4b8fd30f42 Update 2_3_1_Advanced_Visualisation.ipynb
Updated links
2024-02-21 11:00:53 +01:00
Carlos A. Iglesias
d879369930 Update 2_3_0_Visualisation.ipynb
Updated links
2024-02-21 10:57:34 +01:00
Carlos A. Iglesias
4da01f3ae6 Update 2_0_0_Intro_ML.ipynb
Updated links
2024-02-21 10:44:43 +01:00
Carlos A. Iglesias
da9a01e26b Update 2_0_1_Objectives.ipynb
Updated links
2024-02-21 10:43:40 +01:00
Carlos A. Iglesias
dc23b178d7 Delete python/plurals.py 2024-02-08 18:32:43 +01:00
Carlos A. Iglesias
5410d6115d Delete python/catalog.py 2024-02-08 18:32:18 +01:00
Carlos A. Iglesias
6749aa5deb Added files for modules 2024-02-08 18:26:08 +01:00
Carlos A. Iglesias
c31e6c1676 Update 1_2_Numbers_Strings.ipynb 2024-02-08 17:47:42 +01:00
Carlos A. Iglesias
1c7496c8ac Update 1_2_Numbers_Strings.ipynb
Improved formatting.
2024-02-08 17:46:18 +01:00
Carlos A. Iglesias
35b1ae4ec8 Update 1_8_Classes.ipynb
Improved formatting.
2024-02-08 17:43:25 +01:00
Carlos A. Iglesias
58fc6f5e9c Update 1_4_Sets.ipynb
Typo corrected.
2024-02-08 17:42:45 +01:00
Carlos A. Iglesias
91147becee Update 1_3_Sequences.ipynb
Formatting improvement.
2024-02-08 17:41:15 +01:00
Carlos A. Iglesias
1530995243 Update 1_0_Intro_Python.ipynb
Updated links.
2024-02-08 17:36:46 +01:00
Carlos A. Iglesias
0c0960cec7 Update 1_7_Variables.ipynb typo in bold markdown
Typo in bold markdown
2024-02-08 17:33:48 +01:00
cif
3363c953f4 Borrada versión anterior 2023-04-27 15:43:44 +02:00
cif
542ce2708d Actualizada práctica a gymnasium y extendida 2023-04-27 15:42:01 +02:00
106 changed files with 71593 additions and 682 deletions

View File

@@ -1,7 +1,7 @@
# sitc
Exercises for Intelligent Systems Course at Universidad Politécnica de Madrid, Telecommunication Engineering School. This material is used in the subjects
- SITC (Sistemas Inteligentes y Tecnologías del Conocimiento) - Master Universitario de Ingeniería de Telecomunicación (MUIT)
- TIAD (Tecnologías Inteligentes de Análisis de Datos) - Master Universitario en Ingeniera de Redes y Servicios Telemáticos)
- CDAW (Ciencia de datos y aprendizaje en automático en la web de datos) - Master Universitario de Ingeniería de Telecomunicación (MUIT)
- ABID (Analítica de Big Data) - Master Universitario en Ingeniera de Redes y Servicios Telemáticos)
For following this course:
- Follow the instructions to install the environment: https://github.com/gsi-upm/sitc/blob/master/python/1_1_Notebooks.ipynb (Just install 'conda')
@@ -9,11 +9,13 @@ For following this course:
- Run in a terminal in the folder sitc: jupyter notebook (and enjoy)
Topics
* Python: quick introduction to Python
* Python: a quick introduction to Python
* ML-1: introduction to machine learning with scikit-learn
* ML-2: introduction to machine learning with pandas and scikit-learn
* ML-21: preprocessing and visualizatoin
* ML-3: introduction to machine learning. Neural Computing
* ML-4: introduction to Evolutionary Computing
* ML-5: introduction to Reinforcement Learning
* NLP: introduction to NLP
* LOD: Linked Open Data, exercises and example code
* SNA: Social Network Analysis

1
images/.p Normal file
View File

@@ -0,0 +1 @@

BIN
images/EscUpmPolit_p.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

BIN
images/cart.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

BIN
images/data-chart-type.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

BIN
images/frozenlake-world.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

BIN
images/gym-maze.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 222 KiB

BIN
images/iris-classes.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

BIN
images/iris-dataset.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

BIN
images/iris-features.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 944 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

BIN
images/qlearning-algo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

BIN
images/recording.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

BIN
images/titanic.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

4352
lod/BeatlesMusicians.ttl Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -71,7 +71,6 @@
"source": [
"* [Scikit-learn web page](http://scikit-learn.org/stable/)\n",
"* [Scikit-learn videos](http://blog.kaggle.com/author/kevin-markham/) and [notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n",
"* [scikit-learn : Machine Learning Simplified](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), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019."
]
},
@@ -80,7 +79,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -40,10 +40,10 @@
"\n",
"* Learn to use scikit-learn\n",
"* Learn the basic steps to apply machine learning techniques: dataset analysis, load, preprocessing, training, validation, optimization and persistence.\n",
"* Learn how to do a exploratory data analysis\n",
"* Learn how to do an exploratory data analysis\n",
"* Learn how to visualise a dataset\n",
"* Learn how to load a bundled dataset\n",
"* Learn how to separate the dataset into traning and testing datasets\n",
"* Learn how to separate the dataset into training and testing datasets\n",
"* Learn how to train a classifier\n",
"* Learn how to predict with a trained classifier\n",
"* Learn how to evaluate the predictions\n",
@@ -63,9 +63,7 @@
"metadata": {},
"source": [
"* [Scikit-learn web page](http://scikit-learn.org/stable/)\n",
"* [Scikit-learn videos](http://blog.kaggle.com/author/kevin-markham/) and [notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n",
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019."
"* [Scikit-learn videos](http://blog.kaggle.com/author/kevin-markham/) and [notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n"
]
},
{
@@ -73,7 +71,7 @@
"metadata": {},
"source": [
"## LIcence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -87,7 +87,7 @@
"metadata": {},
"source": [
"Scikit-learn provides algorithms for solving the following problems:\n",
"* **Classification**: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, C4.5, ...), kNN, SVM, Random forest, Perceptron, etc. \n",
"* **Classification**: Identifying to which category an object belongs. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, C4.5, ...), kNN, SVM, Random forest, Perceptron, etc. \n",
"* **Clustering**: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",
"* **Regression**: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",
"* **Dimensionality reduction**: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."
@@ -105,7 +105,7 @@
"metadata": {},
"source": [
"In addition, scikit-learn helps in several tasks:\n",
"* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
"* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
"* **Preprocessing**: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."
]
},
@@ -128,9 +128,9 @@
"\n",
"If it is not installed, install it with conda: `conda install scikit-learn`.\n",
"\n",
"If you have installed scipy and numpy, you can also installed using pip: `pip install -U scikit-learn`.\n",
"If you have installed scipy and numpy, you can also install using pip: `pip install -U scikit-learn`.\n",
"\n",
"It is not recommended to use pip for installing scipy and numpy. Instead, use conda or install the linux package *python-sklearn*."
"It is not recommended to use pip to install scipy and numpy. Instead, use conda or install the Linux package *python-sklearn*."
]
},
{
@@ -156,7 +156,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")\n",
"![](./images/EscUpmPolit_p.gif \"UPM\")\n",
"\n",
"# Course Notes for Learning Intelligent Systems\n",
"\n",
@@ -34,11 +34,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to read and load a sample dataset.\n",
"This notebook aims to learn how to read and load a sample dataset.\n",
"\n",
"Scikit-learn comes with some bundled [datasets](https://scikit-learn.org/stable/datasets.html): iris, digits, boston, etc.\n",
"\n",
"In this notebook we are going to use the Iris dataset."
"In this notebook, we will use the Iris dataset."
]
},
{
@@ -54,16 +54,25 @@
"source": [
"The [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), available at [UCI dataset repository](https://archive.ics.uci.edu/ml/datasets/Iris), is a classic dataset for classification.\n",
"\n",
"The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.\n",
"The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.\n",
"\n",
"![Iris](files/images/iris-dataset.jpg)"
"![Iris dataset](./images/iris-dataset.jpg \"Iris\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to read the dataset, we import the datasets bundle and then load the Iris dataset. "
"Here you can see the species and the features.\n",
"![Iris features](./images/iris-features.png \"Iris features\")\n",
"![Iris classes](./images/iris-classes.png \"Iris classes\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To read the dataset, we import the datasets bundle and then load the Iris dataset. "
]
},
{
@@ -180,7 +189,7 @@
"metadata": {},
"outputs": [],
"source": [
"#Using numpy, I can print the dimensions (here we are working with 2D matriz)\n",
"#Using numpy, I can print the dimensions (here we are working with a 2D matrix)\n",
"print(iris.data.ndim)"
]
},
@@ -218,7 +227,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In following sessions we will learn how to load a dataset from a file (csv, excel, ...) using the pandas library."
"In the following sessions, we will learn how to load a dataset from a file (CSV, Excel, ...) using the pandas library."
]
},
{
@@ -246,7 +255,7 @@
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -49,7 +49,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
"This notebook aims to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
]
},
{
@@ -65,13 +65,13 @@
"source": [
"This section covers different ways to inspect the distribution of samples per feature.\n",
"\n",
"First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
"First of all, let's see how many samples we have in each class using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
"\n",
"A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n",
"A histogram is a graphical representation of the distribution of numerical data. It estimates the probability distribution of a continuous variable (quantitative variable). \n",
"\n",
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"For building a histogram, we need to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"\n",
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
"Since the values are not continuous and we have only three values, we do not need to bin them."
]
},
{
@@ -115,7 +115,7 @@
"metadata": {},
"source": [
"As can be seen, we have the same distribution of samples for every class.\n",
"The next step is to see the distribution of the features"
"The next step is to see the distribution of the features."
]
},
{
@@ -184,7 +184,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, the Setosa class seems to be linearly separable with these two features.\n",
"As we can see, the Setosa class seems linearly separable with these two features.\n",
"\n",
"Another nice visualisation is given below."
]
@@ -228,7 +228,6 @@
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](https://learning.oreilly.com/library/view/mastering-pandas/9781789343236/), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
@@ -242,7 +241,7 @@
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -52,11 +52,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous notebook we developed plots with the [matplotlib](http://matplotlib.org/) plotting library.\n",
"In the previous notebook, we developed plots with the [matplotlib](http://matplotlib.org/) plotting library.\n",
"\n",
"This notebook introduces another plotting library, [**seaborn**](https://stanford.edu/~mwaskom/software/seaborn/), which provides advanced facilities for data visualization.\n",
"\n",
"*Seaborn* is a library for making attractive and informative statistical graphics in Python. It is built on top of *matplotlib* and tightly integrated with the *PyData* stack, including support for *numpy* and *pandas* data structures and statistical routines from *scipy* and *statsmodels*.\n",
"*Seaborn* is a library that makes attractive and informative statistical graphics in Python. It is built on top of *matplotlib* and tightly integrated with the *PyData* stack, including support for *numpy* and *pandas* data structures and statistical routines from *scipy* and *statsmodels*.\n",
"\n",
"*Seaborn* requires its input to be *DataFrames* (a structure created with the library *pandas*)."
]
@@ -197,9 +197,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A very common way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of the three different species of iris flowers.\n",
"A widespread way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of the three different species of iris flowers.\n",
"\n",
"We are going to color each class, so that we can easily identify **clustering** and **linear relationships**."
"We are going to color each class, so we can easily identify **clustering** and **linear relationships**."
]
},
{
@@ -220,7 +220,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"By default every numeric column in the dataset is used, but you can focus on particular relationships if you want."
"By default, every numeric column in the dataset is used, but you can focus on particular relationships if you want."
]
},
{
@@ -321,7 +321,7 @@
"metadata": {},
"outputs": [],
"source": [
"# One way we can extend this plot is adding a layer of individual points on top of\n",
"# One way we can extend this plot is by adding a layer of individual points on top of\n",
"# it through Seaborn's striplot\n",
"# \n",
"# We'll use jitter=True so that all the points don't fall in single vertical lines\n",
@@ -347,7 +347,7 @@
"outputs": [],
"source": [
"# A violin plot combines the benefits of the previous two plots and simplifies them\n",
"# Denser regions of the data are fatter, and sparser thiner in a violin plot\n",
"# Denser regions of the data are fatter, and sparser thinner in a violin plot\n",
"sns.violinplot(x=\"species\", y=\"petal length (cm)\", data=iris_df, size=6)"
]
},
@@ -389,10 +389,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Depending on the data, we can choose which visualisation suits better. the following [diagram](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/) guides this selection.\n",
"Depending on the data, we can choose which visualisation suits us better. the following [diagram](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/) guides this selection.\n",
"\n",
"\n",
"![](files/images/data-chart-type.png \"Graphs\")"
"![](./images/data-chart-type.png \"Graphs\")"
]
},
{
@@ -408,7 +408,6 @@
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](https://learning.oreilly.com/library/view/mastering-pandas/9781789343236/), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
@@ -422,7 +421,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -76,7 +76,7 @@
"source": [
"A common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the **training set** on which we learn data properties and one that we call the **testing set** on which we test these properties. \n",
"\n",
"We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
"We will use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
]
},
{
@@ -122,9 +122,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Standardization of datasets is a common requirement for many machine learning estimators implemented in the scikit; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.\n",
"Standardization of datasets is a common requirement for many machine learning estimators implemented in the scikit; they might misbehave if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.\n",
"\n",
"The preprocessing module further provides a utility class `StandardScaler` to compute the mean and standard deviation on a training set. Later, the same transformation will be applied on the testing set."
"The preprocessing module further provides a utility class `StandardScaler` to compute a training set's mean and standard deviation. Later, the same transformation will be applied on the testing set."
]
},
{
@@ -163,7 +163,6 @@
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](https://learning.oreilly.com/library/view/mastering-pandas/9781789343236/), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)"
@@ -174,7 +173,7 @@
"metadata": {},
"source": [
"### Licences\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -53,9 +53,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an introduction of general ideas about machine learning and the interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
"This is an introduction to general ideas about machine learning and the interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
"\n",
"You can skip it during the lab session and read it later,"
"You can skip it during the lab session and read it later."
]
},
{
@@ -69,21 +69,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Machine learning algorithms are programs that learn a model from a dataset with the aim of making predictions or learning structures to organize the data.\n",
"Machine learning algorithms are programs that learn a model from a dataset to make predictions or learn structures to organize the data.\n",
"\n",
"In scikit-learn, machine learning algorithms take as an input a *numpy* array (n_samples, n_features), where\n",
"* **n_samples**: number of samples. Each sample is an item to process (i.e. classify). A sample can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
"* **n_features**: The number of features or distinct traits that can be used to describe each item in a quantitative manner.\n",
"In scikit-learn, machine learning algorithms take as input a *numpy* array (n_samples, n_features), where\n",
"* **n_samples**: number of samples. Each sample is an item to process (i.e., classify). A sample can be a document, a picture, a sound, a video, a row in a database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
"* **n_features**: The number of features or distinct traits that can be used to describe each item quantitatively.\n",
"\n",
"The number of features should be defined in advance. There is a specific type of feature sets that are high dimensional (e.g. millions of features), but most of the values are zero for a given sample. Using (numpy) arrays, all those values that are zero would also take up memory. For this reason, these feature sets are often represented with sparse matrices (scipy.sparse) instead of (numpy) arrays.\n",
"The number of features should be defined in advance. A specific type of feature set is high-dimensional (e.g., millions of features), but most values are zero for a given sample. Using (numpy) arrays, all those zero values would also take up memory. For this reason, these feature sets are often represented with sparse matrices (scipy.sparse) instead of (numpy) arrays.\n",
"\n",
"The first step in machine learning is **identifying the relevant features** from the input data, and the second step is **extracting the features** from the input data. \n",
"\n",
"[Machine learning algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) can be classified according to learning style into:\n",
"* **Supervised learning**: input data (training dataset) has a known label or result. Example problems are classification and regression. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.\n",
"* **Unsupervised learning**: input data is not labeled. A model is prepared by deducing structures present in the input data. This may be to extract general rules. Example problems are clustering, dimensionality reduction and association rule learning.\n",
"* **Semi-supervised learning**:i nput data is a mixture of labeled and unlabeled examples. There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression."
]
"* **Unsupervised learning**: input data is not labeled. A model is prepared by deducing structures present in the input data. This may be to extract general rules. Example problems are clustering, dimensionality reduction, and association rule learning.\n",
"* **Semi-supervised learning**: input data is a mixture of labeled and unlabeled examples. There is a desired prediction problem, but the model must learn the structures to organize the data and make predictions. Example problems are classification and regression."
]
},
{
"cell_type": "markdown",
@@ -96,8 +96,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In *supervised machine learning models*, the machine learning algorithm takes as an input a training dataset, composed of feature vectors and labels, and produces a predictive model which is used for make prediction on new data.\n",
"![](files/images/plot_ML_flow_chart_1.png)"
"In *supervised machine learning models*, the machine learning algorithm takes as input a training dataset, composed of feature vectors and labels, and produces a predictive model used to predict new data.\n",
"![](./images/plot_ML_flow_chart_1.png)"
]
},
{
@@ -111,7 +111,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In *unsupervised machine learning models*, the machine learning model algorithm takes as an input the feature vectors and produces a predictive model that is used to fit its parameters so as to best summarize regularities found in the data.\n",
"In *unsupervised machine learning models*, the machine learning model algorithm takes as input the feature vectors. It produces a predictive model that is used to fit its parameters to summarize the best regularities found in the data.\n",
"![](files/images/plot_ML_flow_chart_3.png)"
]
},
@@ -129,15 +129,15 @@
"scikit-learn has a uniform interface for all the estimators, some methods are only available if the estimator is supervised or unsupervised:\n",
"\n",
"* Available in *all estimators*:\n",
" * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
" * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g., model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
"\n",
"* Available in *supervised estimators*:\n",
" * **model.predict()**: given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.\n",
" * **model.predict()**: given a trained model, predict the label of a new dataset. This method accepts one argument, the new data X_new (e.g., model.predict(X_new)), and returns the learned label for each object in the array.\n",
" * **model.predict_proba()**: For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict().\n",
"\n",
"* Available in *unsupervised estimators*:\n",
" * **model.transform()**: given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model.\n",
" * **model.fit_transform()**: some estimators implement this method, which performs a fit and a transform on the same input data.\n",
" * **model.fit_transform()**: Some estimators implement this method, which performs a fit and a transform on the same input data.\n",
"\n",
"\n",
"![](files/images/plot_ML_flow_chart_2.png)"
@@ -154,7 +154,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"* [General concepts of machine learning with scikit-learn](https://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/auto_examples/tutorial/plot_ML_flow_chart.html)\n",
"* [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)\n",
"* [A Tour of Machine Learning Algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/)"
]
},
@@ -169,7 +169,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -55,7 +55,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to train a model, make predictions with that model and evaluate these predictions.\n",
"The goal of this notebook is to learn how to train a model, make predictions with that model, and evaluate these predictions.\n",
"\n",
"The notebook uses the [kNN (k nearest neighbors) algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)."
]
@@ -212,14 +212,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision, recall and f-score"
"### Precision, recall, and f-score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
"\n",
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
@@ -246,7 +246,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful metric is the confusion matrix"
"Another useful metric is the confusion matrix."
]
},
{
@@ -262,7 +262,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see we classify well all the 'setosa' and 'versicolor' samples. "
"We classify all the 'setosa' and 'versicolor' samples well. "
]
},
{
@@ -276,7 +276,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**."
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**."
]
},
{
@@ -298,7 +298,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"print(scores)"
]
@@ -307,7 +307,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
]
},
{
@@ -340,7 +340,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to tune the algorithm, and calculate which is the best value for the k hyperparameter."
"We will tune the algorithm and calculate the best value for the k hyperparameter."
]
},
{
@@ -365,7 +365,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is very dependent of the input data. Execute again the train_test_split and test again how the result changes with k."
"The result is very dependent on the input data. Execute the train_test_split again and test how the result changes with k."
]
},
{
@@ -379,8 +379,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"* [KNeighborsClassifier API scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)\n",
"* [Learning scikit-learn: Machine Learning in Python](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2013.\n"
"* [KNeighborsClassifier API scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)\n"
]
},
{
@@ -388,7 +387,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -56,9 +56,9 @@
"source": [
"The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
"\n",
"There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
"There are several well-known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0, and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
"\n",
"This notebook will follow the same steps that the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
"This notebook will follow the same steps as the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
"\n",
"You need to install pydotplus: `conda install pydotplus` for the visualization."
]
@@ -69,7 +69,7 @@
"source": [
"## Load data and preprocessing\n",
"\n",
"Here we repeat the same operations for loading data and preprocessing than in the previous notebooks."
"Here we repeat the same operations for loading data and preprocessing as in the previous notebooks."
]
},
{
@@ -262,8 +262,8 @@
"The current version of pydot does not work well in Python 3.\n",
"For obtaining an image, you need to install `pip install pydotplus` and then `conda install graphviz`.\n",
"\n",
"You can skip this example. Since it can require installing additional packages, we include here the result.\n",
"![Decision Tree](files/images/cart.png)"
"You can skip this example. Since it can require installing additional packages, we have included the result here.\n",
"![Decision Tree](./images/cart.png)"
]
},
{
@@ -330,7 +330,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we are going to export the pseudocode of the the learnt decision tree."
"Next, we will export the pseudocode of the learnt decision tree."
]
},
{
@@ -378,14 +378,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision, recall and f-score"
"### Precision, recall, and f-score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
"\n",
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
@@ -412,7 +412,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful metric is the confusion matrix"
"Another useful metric is the confusion matrix."
]
},
{
@@ -428,7 +428,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see we classify well all the 'setosa' and 'versicolor' samples. "
"We classify all the 'setosa' and 'versicolor' samples well. "
]
},
{
@@ -442,7 +442,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
"\n",
"Sklearn comes with other strategies for [cross validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation), such as stratified K-fold, label k-fold, Leave-One-Out, Leave-P-Out, Leave-One-Label-Out, Leave-P-Label-Out or Shuffle & Split."
]
@@ -466,7 +466,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"print(scores)"
]
@@ -475,7 +475,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
]
},
{
@@ -509,8 +509,6 @@
"metadata": {},
"source": [
"* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n",
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n",
"* [Parameter estimation using grid search with cross-validation](https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html)\n",
"* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)"
]
@@ -520,7 +518,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -58,7 +58,7 @@
"source": [
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
"\n",
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search."
"This notebook aims to learn how to tune an algorithm by optimizing its hyperparameters using grid search."
]
},
{
@@ -137,7 +137,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"\n",
"from scipy.stats import sem\n",
@@ -189,7 +189,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can get the list of parameters of the model. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
"We can get the list of model parameters. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
]
},
{
@@ -205,7 +205,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what happens if we change a parameter"
"Let's see what happens if we change a parameter."
]
},
{
@@ -284,7 +284,7 @@
"\n",
"Look at the [API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of *scikit-learn* to understand better the algorithm, as well as which parameters can be tuned. As you see, we can change several ones, such as *criterion*, *splitter*, *max_features*, *max_depth*, *min_samples_split*, *class_weight*, etc.\n",
"\n",
"We can get the full list parameters of an estimator with the method *get_params()*. "
"We can get an estimator's full list of parameters with the method *get_params()*. "
]
},
{
@@ -314,16 +314,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n",
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider finding the optimal value of the hyperparameters as an *optimization problem*. \n",
"\n",
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
"Sklearn has several optimization techniques, such as **grid search** and **randomized search**. In this notebook, we are going to introduce the former one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
"Sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
]
},
{
@@ -351,7 +351,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to show the results of grid search"
"Now we are going to show the results of the grid search"
]
},
{
@@ -392,7 +392,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"def mean_score(scores):\n",
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
@@ -405,7 +405,7 @@
"source": [
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
"\n",
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
"We are now trying to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
]
},
{
@@ -492,7 +492,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"def mean_score(scores):\n",
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
@@ -518,8 +518,6 @@
"metadata": {},
"source": [
"* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n",
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n",
"* [Hyperparameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n",
"* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)"
]
@@ -535,7 +533,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -48,9 +48,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to save a model in the the scikit by using Pythons built-in persistence model, namely pickle\n",
"The goal of this notebook is to learn how to save a model in the scikit by using Pythons built-in persistence model, namely pickle\n",
"\n",
"First we recap the previous tasks: load data, preprocess and train the model."
"First, we recap the previous tasks: load data, preprocess, and train the model."
]
},
{
@@ -107,7 +107,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case the model can only be saved to a file and not to a string."
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case, the model can only be saved to a file and not to a string."
]
},
{
@@ -146,7 +146,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -52,7 +52,7 @@
"\n",
"Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.\n",
"\n",
"The plots show training points in solid colors and testing points semi-transparent. The lower right shows the classification accuracy on the test set.\n",
"The plots show training points in solid colors and testing points in semi-transparent colors. The lower right shows the classification accuracy on the test set.\n",
"\n",
"The [DummyClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html#sklearn.dummy.DummyClassifier) is a classifier that makes predictions using simple rules. It is useful as a simple baseline to compare with other (real) classifiers. \n",
"\n",
@@ -94,7 +94,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

BIN
ml1/images/iris-classes.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 944 KiB

View File

@@ -47,7 +47,7 @@ def get_code(tree, feature_names, target_names,
recurse(left, right, threshold, features, 0, 0)
# Taken from http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#example-tree-plot-iris-py
# Taken from https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html
import numpy as np
import matplotlib.pyplot as plt
@@ -114,4 +114,4 @@ def plot_tree_iris():
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend()
plt.show()
plt.show()

View File

@@ -74,9 +74,7 @@
"metadata": {},
"source": [
"* [IPython Notebook Tutorial for Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic/forums/t/5105/ipython-notebook-tutorial-for-titanic-machine-learning-from-disaster)\n",
"* [Scikit-learn videos and notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n",
"* [Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits](https://learning.oreilly.com/library/view/hands-on-machine-learning/9781838826048/), Tarek Amr, Packt Publishing, 2020.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka and Vahid Mirjalili, Packt Publishing, 2019."
"* [Scikit-learn videos and notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n"
]
},
{

View File

@@ -50,30 +50,30 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this session we will work with the Titanic dataset. This dataset is provided by [Kaggle](http://www.kaggle.com). Kaggle is a crowdsourcing platform that organizes competitions where researchers and companies post their data and users compete to obtain the best models.\n",
"In this session, we will work with the Titanic dataset. This dataset is provided by [Kaggle](http://www.kaggle.com). Kaggle is a crowdsourcing platform that organizes competitions where researchers and companies post their data and users compete to obtain the best models.\n",
"\n",
"![Titanic](images/titanic.jpg)\n",
"\n",
"\n",
"The main objective is predicting which passengers survived the sinking of the Titanic.\n",
"The main objective is to predict which passengers survived the sinking of the Titanic.\n",
"\n",
"The data is available [here](https://www.kaggle.com/c/titanic/data). There are two files, one for training ([train.csv](files/data-titanic/train.csv)) and another file for testing [test.csv](files/data-titanic/test.csv). A local copy has been included in this notebook under the folder *data-titanic*.\n",
"\n",
"\n",
"Here follows a description of the variables.\n",
"\n",
"|Variable | Description| Values|\n",
"|-------------------------------|\n",
"| survival| Survival| (0 = No; 1 = Yes)|\n",
"|Pclass |Name | |\n",
"|Sex |Sex | male, female|\n",
"|Age |Age|\n",
"|SibSp |Number of Siblings/Spouses Aboard||\n",
"|Parch |Number of Parents/Children Aboard||\n",
"|Ticket|Ticket Number||\n",
"|Fare |Passenger Fare||\n",
"|Cabin |Cabin||\n",
"|Embarked |Port of Embarkation| (C = Cherbourg; Q = Queenstown; S = Southampton)|\n",
"| Variable | Description | Values |\n",
"|------------|---------------------------------|-----------------|\n",
"| survival | Survival |(0 = No; 1 = Yes)|\n",
"| Pclass | Name | |\n",
"| Sex | Sex | male, female |\n",
"| Age | Age | |\n",
"| SibSp |Number of Siblings/Spouses Aboard| |\n",
"| Parch |Number of Parents/Children Aboard| |\n",
"| Ticket | Ticket Number | |\n",
"| Fare | Passenger Fare | |\n",
"| Cabin | Cabin | |\n",
"| Embarked | Port of Embarkation | (C = Cherbourg; Q = Queenstown; S = Southampton)|\n",
"\n",
"\n",
"The definitions used for SibSp and Parch are:\n",
@@ -213,8 +213,7 @@
"* [Pandas API input-output](http://pandas.pydata.org/pandas-docs/stable/api.html#input-output)\n",
"* [Pandas API - pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)\n",
"* [DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html)\n",
"* [An introduction to NumPy and Scipy](https://sites.engineering.ucsb.edu/~shell/che210d/numpy.pdf)\n",
"* [NumPy tutorial](https://numpy.org/doc/stable/)"
"* [An introduction to NumPy and Scipy](https://sites.engineering.ucsb.edu/~shell/che210d/numpy.pdf)\n"
]
},
{

View File

@@ -433,7 +433,6 @@
"metadata": {},
"source": [
"* [Pandas](http://pandas.pydata.org/)\n",
"* [Learning Pandas, Michael Heydt, Packt Publishing, 2017](https://learning.oreilly.com/library/view/learning-pandas/9781787123137/)\n",
"* [Pandas. Introduction to Data Structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html)\n",
"* [Introducing Pandas Objects](https://www.oreilly.com/learning/introducing-pandas-objects)\n",
"* [Boolean Operators in Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-operators)"

View File

@@ -373,8 +373,8 @@
"source": [
"#Mean age of passengers per Passenger class\n",
"\n",
"#First we calculate the mean\n",
"df.groupby('Pclass').mean()"
"#First we calculate the mean for the numeric columns\n",
"df.select_dtypes(np.number).groupby('Pclass').mean()"
]
},
{
@@ -451,7 +451,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Pivot tables are an intuitive way to analyze data, and alternative to group columns."
"Pivot tables are an intuitive way to analyze data, and an alternative to group columns.\n",
"\n",
"This command makes a table with rows Sex and columns Pclass, and\n",
"averages the result of the column Survived, thereby giving the percentage of survivors in each grouping."
]
},
{
@@ -460,7 +463,14 @@
"metadata": {},
"outputs": [],
"source": [
"pd.pivot_table(df, index='Sex')"
"pd.pivot_table(df, index='Sex', columns='Pclass', values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we want to analyze multi-index, the percentage of survivoers, given sex and age, and distributed by Pclass."
]
},
{
@@ -469,7 +479,14 @@
"metadata": {},
"outputs": [],
"source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'])"
"pd.pivot_table(df, index=['Sex', 'Age'], columns=['Pclass'], values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nevertheless, this is not very useful since we have a row per age. Thus, we define a partition."
]
},
{
@@ -478,7 +495,8 @@
"metadata": {},
"outputs": [],
"source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'])"
"# Partition each of the passengers into 3 categories based on their age\n",
"age = pd.cut(df['Age'], [0,12,18,80])"
]
},
{
@@ -487,7 +505,14 @@
"metadata": {},
"outputs": [],
"source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'], aggfunc=np.mean)"
"pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can change the function used for aggregating each group."
]
},
{
@@ -496,8 +521,18 @@
"metadata": {},
"outputs": [],
"source": [
"# Try np.sum, np.size, len\n",
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'], aggfunc=[np.mean, np.sum])"
"# default\n",
"pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'], aggfunc=np.mean)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Two agg functions\n",
"pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'], aggfunc=[np.mean, np.sum])"
]
},
{
@@ -972,7 +1007,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.11.5"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,

View File

@@ -220,7 +220,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Analise distributon\n",
"# Analise distribution\n",
"df.hist(figsize=(10,10))\n",
"plt.show()"
]
@@ -233,7 +233,7 @@
"source": [
"# We can see the pairwise correlation between variables. A value near 0 means low correlation\n",
"# while a value near -1 or 1 indicates strong correlation.\n",
"df.corr()"
"df.corr(numeric_only = True)"
]
},
{
@@ -249,11 +249,10 @@
"metadata": {},
"outputs": [],
"source": [
"# General description of relationship betweek variables uwing Seaborn PairGrid\n",
"# General description of relationship between variables uwing Seaborn PairGrid\n",
"# We use df_clean, since the null values of df would gives us an error, you can check it.\n",
"g = sns.PairGrid(df_clean, hue=\"Survived\")\n",
"g.map_diag(plt.hist)\n",
"g.map_offdiag(plt.scatter)\n",
"g.map(sns.scatterplot)\n",
"g.add_legend()"
]
},

View File

@@ -351,10 +351,10 @@
"We can obtain more information from the confussion matrix and the metric F1-score.\n",
"In a confussion matrix, we can see:\n",
"\n",
"||**Predicted**: 0| **Predicted: 1**|\n",
"|---------------------------|\n",
"|**Actual: 0**| TN | FP |\n",
"|**Actual: 1**| FN|TP|\n",
"| |**Predicted**: 0| **Predicted: 1**|\n",
"|-------------|----------------|-----------------|\n",
"|**Actual: 0**| TN | FP |\n",
"|**Actual: 1**| FN | TP |\n",
"\n",
"* **True negatives (TN)**: actual negatives that were predicted as negatives\n",
"* **False positives (FP)**: actual negatives that were predicted as positives\n",

BIN
ml2/images/iris-classes.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 944 KiB

1
ml21/.gitkeep Normal file
View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,157 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction to Preprocessing\n",
"In this session, we will get more insight regarding how to preprocess data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Objectives\n",
"The main objectives of this session are:\n",
"* Understanding the need for preprocessing\n",
"* Understanding different preprocessing techniques\n",
"* Experimenting with several environments for preprocessing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Table of Contents"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"1. [Home](00_Intro_Preprocessing.ipynb)\n",
"3. [Initial Check](02_Initial_Check.ipynb)\n",
"4. [Filter Data](03_Filter_Data.ipynb)\n",
"5. [Unknown values](04_Unknown_Values.ipynb)\n",
"6. [Duplicated values](05_Duplicated_Values.ipynb)\n",
"7. [Rescaling Data](06_Rescaling_Data.ipynb)\n",
"8. [Binarize Data](07_Binarize_Data.ipynb)\n",
"9. [Categorial features](08_Categorical.ipynb)\n",
"10. [String Data](09_String_Data.ipynb)\n",
"12. [Handy libraries for preprocessing](11_0_Handy.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,714 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Initial Check with Pandas\n",
"\n",
"We can start with a quick quality check."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Load and check data\n",
"Check which data you are loading."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Moran, Mr. James</td>\n",
" <td>male</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>330877</td>\n",
" <td>8.4583</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>McCarthy, Mr. Timothy J</td>\n",
" <td>male</td>\n",
" <td>54.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>17463</td>\n",
" <td>51.8625</td>\n",
" <td>E46</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Palsson, Master. Gosta Leonard</td>\n",
" <td>male</td>\n",
" <td>2.0</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>349909</td>\n",
" <td>21.0750</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td>\n",
" <td>female</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>347742</td>\n",
" <td>11.1333</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>Nasser, Mrs. Nicholas (Adele Achem)</td>\n",
" <td>female</td>\n",
" <td>14.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>237736</td>\n",
" <td>30.0708</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"5 6 0 3 \n",
"6 7 0 1 \n",
"7 8 0 3 \n",
"8 9 1 3 \n",
"9 10 1 2 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"5 Moran, Mr. James male NaN 0 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"7 Palsson, Master. Gosta Leonard male 2.0 3 \n",
"8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
"5 0 330877 8.4583 NaN Q \n",
"6 0 17463 51.8625 E46 S \n",
"7 1 349909 21.0750 NaN S \n",
"8 2 347742 11.1333 NaN S \n",
"9 0 237736 30.0708 NaN C "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv')\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Check number of columns and rows"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(891, 12)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Check names and types of columns\n",
"Check the data and type, for example if dates are of strings or what."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n",
" 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n",
" dtype='object')\n"
]
},
{
"data": {
"text/plain": [
"PassengerId int64\n",
"Survived int64\n",
"Pclass int64\n",
"Name object\n",
"Sex object\n",
"Age float64\n",
"SibSp int64\n",
"Parch int64\n",
"Ticket object\n",
"Fare float64\n",
"Cabin object\n",
"Embarked object\n",
"dtype: object"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get column names\n",
"print(df.columns)\n",
"# Get column data types\n",
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Check if the column is unique"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"PassengerId is unique: True\n",
"Survived is unique: False\n",
"Pclass is unique: False\n",
"Name is unique: True\n",
"Sex is unique: False\n",
"Age is unique: False\n",
"SibSp is unique: False\n",
"Parch is unique: False\n",
"Ticket is unique: False\n",
"Fare is unique: False\n",
"Cabin is unique: False\n",
"Embarked is unique: False\n"
]
}
],
"source": [
"for i in column_names:\n",
" print('{} is unique: {}'.format(i, df[i].is_unique))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Check if the dataframe has an index\n",
"We will need it to do joins or merges."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=891, step=1)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# check if there is an index. If not, you will get 'AtributeError: function object has no atribute index'\n",
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,\n",
" 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n",
" 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,\n",
" 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,\n",
" 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,\n",
" 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,\n",
" 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,\n",
" 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,\n",
" 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,\n",
" 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,\n",
" 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,\n",
" 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,\n",
" 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,\n",
" 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,\n",
" 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,\n",
" 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,\n",
" 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,\n",
" 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,\n",
" 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,\n",
" 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259,\n",
" 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272,\n",
" 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285,\n",
" 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298,\n",
" 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311,\n",
" 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324,\n",
" 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337,\n",
" 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350,\n",
" 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,\n",
" 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376,\n",
" 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,\n",
" 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402,\n",
" 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415,\n",
" 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428,\n",
" 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441,\n",
" 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454,\n",
" 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467,\n",
" 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480,\n",
" 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493,\n",
" 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506,\n",
" 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519,\n",
" 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532,\n",
" 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,\n",
" 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558,\n",
" 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571,\n",
" 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584,\n",
" 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597,\n",
" 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610,\n",
" 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623,\n",
" 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636,\n",
" 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649,\n",
" 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662,\n",
" 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675,\n",
" 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688,\n",
" 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701,\n",
" 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714,\n",
" 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727,\n",
" 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740,\n",
" 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753,\n",
" 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766,\n",
" 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779,\n",
" 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792,\n",
" 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805,\n",
" 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818,\n",
" 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831,\n",
" 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844,\n",
" 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857,\n",
" 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870,\n",
" 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883,\n",
" 884, 885, 886, 887, 888, 889, 890])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# # Check the index values\n",
"df.index.values"
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# If index does not exist\n",
"df.set_index('column_name_to_use', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Survived 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 177\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 0\n",
"Cabin 687\n",
"Embarked 2\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Count missing vales per column\n",
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,150 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Filter Data\n",
"\n",
"Select the columns you want and delete the others."
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"# Create list comprehension of the columns you want to lose\n",
"columns_to_drop = [column_names[i] for i in [1, 3, 5]]\n",
"# Drop unwanted columns \n",
"df.drop(columns_to_drop, inplace=True, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,591 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Unknown values\n",
"\n",
"Two possible approaches are **remove** these rows or **fill** them. It depends on every case."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Filling NaN values\n",
"If we need to fill errors or blanks, we can use the methods **fillna()** or **dropna()**.\n",
"\n",
"* For **string** fields, we can fill NaN with **' '**.\n",
"\n",
"* For **numbers**, we can fill with the **mean** or **median** value. \n"
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Fill NaN with ' '\n",
"df['col'] = df['col'].fillna(' ')\n",
"# Fill NaN with 99\n",
"df['col'] = df['col'].fillna(99)\n",
"# Fill NaN with the mean of the column\n",
"df['col'] = df['col'].fillna(df['col'].mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Propagate non-null values forward or backward\n",
"You can also **propagate** non-null values with these methods:\n",
"\n",
"* **ffill**: Fill values by propagating the last valid observation to the next valid.\n",
"* **bfill**: Fill values using the following valid observation to fill the gap.\n",
"* **interpolate**: Fill NaN values using interpolation.\n",
"\n",
"It will fill the next value in the dataframe with the previous non-NaN value. \n",
"\n",
"You may want to fill in one value (**limit=1**) or all the values. You can also indicate inplace=True to fill in-place."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"df = pd.DataFrame(data={'col1':[np.nan, np.nan, 2,3,4, np.nan, np.nan]})"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 NaN\n",
"1 NaN\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 NaN\n",
"6 NaN"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We fill forward the value 4.0 and fill the next one (limit = 1)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 NaN\n",
"1 NaN\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 4.0\n",
"6 NaN"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" df.ffill(limit = 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.ffill()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can also backfilling with **bfill**. Since we do not include *limit*, we fill all the values."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 2.0\n",
"1 2.0\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 NaN\n",
"6 NaN"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.bfill()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Removing NaN values\n",
"We can remove them by row or column (use inplace=True if you want to modify the DataFrame)."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Drop any rows which have any nans\n",
"df1 = df.dropna()\n",
"# Drop columns that have any nans (axis = 1 -> drop columns, axis = 0 -> drop rows)\n",
"df2 = df.dropna(axis=1)\n",
"# Only drop columns which have at least 90% non-NaNs \n",
"df3 = df.dropna(thresh=int(df.shape[0] * .9), axis=1)\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,198 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Binarize Data\n",
"* We can transform our data using a binary threshold. All values above the threshold are marked 1, and all values equal to or below are marked 0.\n",
"* This is called binarizing your data or thresholding your data. \n",
"\n",
"* It can be helpful when you have probabilities that you want to make crisp values."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Binarize Data with Scikit-Learn\n",
"We can create new binary attributes in Python using Scikit-learn with the Binarizer class.\n",
"I"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.preprocessing import Binarizer\n",
"\n",
"X = [[ 1., -1., 2.],\n",
" [ 2., 0., 0.],\n",
" [ 0., 1.1, -1.]]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"transformer = Binarizer(threshold=1.0).fit(X) # threshold 1.0"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 0., 1.],\n",
" [1., 0., 0.],\n",
" [0., 1., 0.]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transformer.transform(X)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Binarizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html), Scikit Learn"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,812 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Categorical Data\n",
"\n",
"For many ML algorithms, we need to transform categorical data into numbers.\n",
"\n",
"For example:\n",
"* **'Sex'** with values *'M'*, *'F'*, *'Unknown'*. \n",
"* **'Position'** with values 'phD', *'Professor'*, *'TA'*, *'graduate'*.\n",
"* **'Temperature'** with values *'low'*, *'medium'*, *'high'*.\n",
"\n",
"There are two main approaches:\n",
"* Integer encoding\n",
"* One hot encoding"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Integer Encoding\n",
"We assign a number to every value:\n",
"\n",
"['M', 'F', 'Unknown', 'M'] --> [0, 1, 2, 0]\n",
"\n",
"['phD', 'Professor', 'TA','graduate', 'phD'] --> [0, 1, 2, 3, 0]\n",
"\n",
"['low', 'medium', 'high', 'low'] --> [0, 1, 2, 0]\n",
"\n",
"The main problem with this representation is integers have a natural order, and some ML algorithms can be confused. \n",
"\n",
"In our examples, this representation can be suitable for **temperature**, but not for the other two."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## One Hot Encoding\n",
"A binary column is created for each value of the categorical variable."
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Sex M F U\n",
"----- ---------\n",
"M 1 0 0\n",
"F is transformed into 0 1 0\n",
"Unknown 0 0 1\n",
"M 1 0 0 "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Transforming categorical data with Scikit-Learn\n",
"\n",
"We can use:\n",
"* **get_dummies()** (one hot encoding)\n",
"* **LabelEncoder** (integer encoding) and **OneHotEncoder** (one hot encoding). \n",
"\n",
"We are going to learn the first approach."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### One Hot Encoding\n",
"We can use Pandas (*get_dummies*) or Scikit-Learn (*OneHotEncoder*)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Name Age Sex Position\n",
"0 Marius 18 Male graduate\n",
"1 Maria 19 Female professor\n",
"2 John 20 Male TA\n",
"3 Carla 30 Female phD\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"data = {\"Name\": [\"Marius\", \"Maria\", \"John\", \"Carla\"],\n",
" \"Age\": [18, 19, 20, 30],\n",
"\t\t\"Sex\": [\"Male\", \"Female\", \"Male\", \"Female\"],\n",
" \"Position\": [\"graduate\", \"professor\", \"TA\", \"phD\"]\n",
" }\n",
"df = pd.DataFrame(data)\n",
"print(df)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Age</th>\n",
" <th>sex_encoded</th>\n",
" <th>position_encoded</th>\n",
" <th>Sex_Female</th>\n",
" <th>Sex_Male</th>\n",
" <th>Position_TA</th>\n",
" <th>Position_graduate</th>\n",
" <th>Position_phD</th>\n",
" <th>Position_professor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Marius</td>\n",
" <td>18</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maria</td>\n",
" <td>19</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Carla</td>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Age sex_encoded position_encoded Sex_Female Sex_Male \\\n",
"0 Marius 18 1 1 False True \n",
"1 Maria 19 0 3 True False \n",
"2 John 20 1 0 False True \n",
"3 Carla 30 0 2 True False \n",
"\n",
" Position_TA Position_graduate Position_phD Position_professor \n",
"0 False True False False \n",
"1 False False False True \n",
"2 True False False False \n",
"3 False False True False "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.\n"
]
}
],
"source": [
"df_onehot = pd.get_dummies(df, columns=['Sex', 'Position'])\n",
"df_onehot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also use *OneHotEncoder* from Scikit."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sex_Female</th>\n",
" <th>Sex_Male</th>\n",
" <th>Position_TA</th>\n",
" <th>Position_graduate</th>\n",
" <th>Position_phD</th>\n",
" <th>Position_professor</th>\n",
" <th>Name</th>\n",
" <th>Age</th>\n",
" <th>sex_encoded</th>\n",
" <th>position_encoded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Marius</td>\n",
" <td>18</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>Maria</td>\n",
" <td>19</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>Carla</td>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sex_Female Sex_Male Position_TA Position_graduate Position_phD \\\n",
"0 0.0 1.0 0.0 1.0 0.0 \n",
"1 1.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 1.0 1.0 0.0 0.0 \n",
"3 1.0 0.0 0.0 0.0 1.0 \n",
"\n",
" Position_professor Name Age sex_encoded position_encoded \n",
"0 0.0 Marius 18 1 1 \n",
"1 1.0 Maria 19 0 3 \n",
"2 0.0 John 20 1 0 \n",
"3 0.0 Carla 30 0 2 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import OneHotEncoder\n",
"from sklearn.compose import make_column_transformer\n",
"\n",
"df_onehotencoder = df\n",
"# create OneHotEncoder object\n",
"encoder = OneHotEncoder()\n",
"\n",
"# Transformer for several columns\n",
"transformer = make_column_transformer(\n",
" (OneHotEncoder(), ['Sex', 'Position']),\n",
" remainder='passthrough',\n",
" verbose_feature_names_out=False)\n",
"\n",
"# transform\n",
"transformed = transformer.fit_transform(df_onehotencoder)\n",
"\n",
"df_onehotencoder = pd.DataFrame(\n",
" transformed,\n",
" columns=transformer.get_feature_names_out())\n",
"df_onehotencoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas' get_dummy is easier for transforming DataFrames. OneHotEncoder is more efficient and can be good for integrating the step in a machine learning pipeline."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integer encoding\n",
"We will use **LabelEncoder**. It is possible to get the original values with *inverse_transform*. See [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Age</th>\n",
" <th>Sex</th>\n",
" <th>Position</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Marius</td>\n",
" <td>18</td>\n",
" <td>Male</td>\n",
" <td>graduate</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maria</td>\n",
" <td>19</td>\n",
" <td>Female</td>\n",
" <td>professor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>Male</td>\n",
" <td>TA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Carla</td>\n",
" <td>30</td>\n",
" <td>Female</td>\n",
" <td>phD</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Age Sex Position\n",
"0 Marius 18 Male graduate\n",
"1 Maria 19 Female professor\n",
"2 John 20 Male TA\n",
"3 Carla 30 Female phD"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import LabelEncoder\n",
"# creating instance of labelencoder\n",
"labelencoder = LabelEncoder()\n",
"df_encoded = df\n",
"# Assigning numerical values and storing in another column\n",
"sex_values = ('Male', 'Female')\n",
"position_values = ('graduate', 'professor', 'TA', 'phD')\n",
"df_encoded"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Age</th>\n",
" <th>Sex</th>\n",
" <th>Position</th>\n",
" <th>sex_encoded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Marius</td>\n",
" <td>18</td>\n",
" <td>Male</td>\n",
" <td>graduate</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maria</td>\n",
" <td>19</td>\n",
" <td>Female</td>\n",
" <td>professor</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>Male</td>\n",
" <td>TA</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Carla</td>\n",
" <td>30</td>\n",
" <td>Female</td>\n",
" <td>phD</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Age Sex Position sex_encoded\n",
"0 Marius 18 Male graduate 1\n",
"1 Maria 19 Female professor 0\n",
"2 John 20 Male TA 1\n",
"3 Carla 30 Female phD 0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_encoded['sex_encoded'] = labelencoder.fit_transform(df_encoded['Sex'])\n",
"df_encoded"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Age</th>\n",
" <th>Sex</th>\n",
" <th>Position</th>\n",
" <th>sex_encoded</th>\n",
" <th>position_encoded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Marius</td>\n",
" <td>18</td>\n",
" <td>Male</td>\n",
" <td>graduate</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maria</td>\n",
" <td>19</td>\n",
" <td>Female</td>\n",
" <td>professor</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>Male</td>\n",
" <td>TA</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Carla</td>\n",
" <td>30</td>\n",
" <td>Female</td>\n",
" <td>phD</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Age Sex Position sex_encoded position_encoded\n",
"0 Marius 18 Male graduate 1 1\n",
"1 Maria 19 Female professor 0 3\n",
"2 John 20 Male TA 1 0\n",
"3 Carla 30 Female phD 0 2"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_encoded['position_encoded'] = labelencoder.fit_transform(df_encoded['Position'])\n",
"df_encoded"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Binarizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html), Scikit Learn"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,652 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# String Data\n",
"It is widespread to clean string columns to follow a predefined format (e.g., emails, URLs, ...).\n",
"\n",
"We can do it using regular expressions or specific libraries."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Beautifier\n",
"A simple [library](https://github.com/labtocat/beautifier) to cleanup and prettify URL patterns, domains, and so on. The library helps to clean Unicode, special characters, and unnecessary redirection patterns from the URLs and gives you a clean date.\n",
"\n",
"Install with **'pip install beautifier'**."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Email cleanup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from beautifier import Email\n",
"email = Email('me@imsach.in')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'imsach.in'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.domain"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'me'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.username"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.is_free_email"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"email2 = Email('This my address')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email2.is_valid"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"email3 = Email('pepe@gmail.com')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email3.is_valid"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email3.is_free_email"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## URL cleanup"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from beautifier import Url\n",
"url = Url('https://in.linkedin.com/in/sachinphilip?authtoken=887nasdadasd6hasdtg21&secret=98jy766yhhuhnjk')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'https://in.linkedin.com/in/sachinphilip'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.cleanup"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'in.linkedin.com'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.domain"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['authtoken=887nasdadasd6hasdtg21', 'secret=98jy766yhhuhnjk']"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.param"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'authtoken=887nasdadasd6hasdtg21&secret=98jy766yhhuhnjk'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.parameters"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'sachinphilip'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.username"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Unicode\n",
"Problem: Some unicode code has been broken. We see the character in a different character dataset.\n",
"\n",
"A **mojibake** is a character displayed in an unintended character encoding. Example: \"<22>\").\n",
"\n",
"We will use the library **ftfy** (fixed text for you) to fix it.\n",
"\n",
"First, you should install the library: **conda install ftfy** (or **pip install ftfy**)."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"¯\\_(ツ)_/¯\n",
"Party\n",
"I'm\n"
]
}
],
"source": [
"import ftfy\n",
"foo = '&macr;\\\\_(ã\\x83\\x84)_/&macr;'\n",
"bar = '\\ufeffParty'\n",
"baz = '\\001\\033[36;44mI&#x92;m'\n",
"print(ftfy.fix_text(foo))\n",
"print(ftfy.fix_text(bar))\n",
"print(ftfy.fix_text(baz))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can understand which heuristics ftfy is using."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"U+0026 & [Po] AMPERSAND\n",
"U+006D m [Ll] LATIN SMALL LETTER M\n",
"U+0061 a [Ll] LATIN SMALL LETTER A\n",
"U+0063 c [Ll] LATIN SMALL LETTER C\n",
"U+0072 r [Ll] LATIN SMALL LETTER R\n",
"U+003B ; [Po] SEMICOLON\n",
"U+005C \\ [Po] REVERSE SOLIDUS\n",
"U+005F _ [Pc] LOW LINE\n",
"U+0028 ( [Ps] LEFT PARENTHESIS\n",
"U+00E3 ã [Ll] LATIN SMALL LETTER A WITH TILDE\n",
"U+0083 \\x83 [Cc] <unknown>\n",
"U+0084 \\x84 [Cc] <unknown>\n",
"U+0029 ) [Pe] RIGHT PARENTHESIS\n",
"U+005F _ [Pc] LOW LINE\n",
"U+002F / [Po] SOLIDUS\n",
"U+0026 & [Po] AMPERSAND\n",
"U+006D m [Ll] LATIN SMALL LETTER M\n",
"U+0061 a [Ll] LATIN SMALL LETTER A\n",
"U+0063 c [Ll] LATIN SMALL LETTER C\n",
"U+0072 r [Ll] LATIN SMALL LETTER R\n",
"U+003B ; [Po] SEMICOLON\n"
]
}
],
"source": [
"ftfy.explain_unicode(foo)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Dates\n",
"Sometimes we want to extract date from text. We can use regular expressions or handy packages, such as [**python-dateutil**](https://dateutil.readthedocs.io/en/stable/). An alternative is [arrow](https://arrow.readthedocs.io/en/latest/).\n",
"\n",
"Install the library: **pip install python-dateutil**."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-22 10:22:46+00:00\n"
]
}
],
"source": [
"from dateutil.parser import parse\n",
"now = parse(\"Thu Aug 22 10:22:46 UTC 2019\")\n",
"print(now)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-08 10:20:00\n"
]
}
],
"source": [
"dt = parse(\"Today is Thursday 8, 2019 at 10:20:00AM\", fuzzy=True)\n",
"print(dt)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/), , A. Sharma, 2018.\n",
"* [Beautifier](https://github.com/labtocat/beautifier) package\n",
"* [Ftfy](https://ftfy.readthedocs.io/en/latest/) package\n",
"* [python-dateutil](https://dateutil.readthedocs.io/en/stable/)package"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,139 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Handy libraries\n",
"Libraries that help in several preprocessing tasks.\n",
"\n",
"* [datacleaner](11_1_datacleaner.ipynb)\n",
"* [autoclean](11_3_autoclean.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/), A. Sharma, 2018.\n",
"* [Handy Python Libraries for Formatting and Cleaning Data](https://mode.com/blog/python-data-cleaning-libraries), M. Bierly, 2016\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,673 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Datacleaner\n",
"[Datacleaner](https://github.com/rhiever/datacleaner) supports:\n",
"\n",
"* drop rows with missing values\n",
"* replace missing values with the mode or median on a column-by-column basis\n",
"* encode non-numeric variables with numerical equivalents\n",
"\n",
"\n",
"Install with\n",
"\n",
"**pip install datacleaner**"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>886</th>\n",
" <td>887</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>Montvila, Rev. Juozas</td>\n",
" <td>male</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>211536</td>\n",
" <td>13.0000</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>887</th>\n",
" <td>888</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Graham, Miss. Margaret Edith</td>\n",
" <td>female</td>\n",
" <td>19.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>112053</td>\n",
" <td>30.0000</td>\n",
" <td>B42</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>888</th>\n",
" <td>889</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Johnston, Miss. Catherine Helen \"Carrie\"</td>\n",
" <td>female</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>W./C. 6607</td>\n",
" <td>23.4500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>889</th>\n",
" <td>890</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Behr, Mr. Karl Howell</td>\n",
" <td>male</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>111369</td>\n",
" <td>30.0000</td>\n",
" <td>C148</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>891</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Dooley, Mr. Patrick</td>\n",
" <td>male</td>\n",
" <td>32.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>370376</td>\n",
" <td>7.7500</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>891 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
".. ... ... ... \n",
"886 887 0 2 \n",
"887 888 1 1 \n",
"888 889 0 3 \n",
"889 890 1 1 \n",
"890 891 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
".. ... ... ... ... \n",
"886 Montvila, Rev. Juozas male 27.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"889 Behr, Mr. Karl Howell male 26.0 0 \n",
"890 Dooley, Mr. Patrick male 32.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
".. ... ... ... ... ... \n",
"886 0 211536 13.0000 NaN S \n",
"887 0 112053 30.0000 B42 S \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"889 0 111369 30.0000 C148 C \n",
"890 0 370376 7.7500 NaN Q \n",
"\n",
"[891 rows x 12 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"from datacleaner import autoclean\n",
"\n",
"df = pd.read_csv('https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv')\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>108</td>\n",
" <td>1</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>523</td>\n",
" <td>7.2500</td>\n",
" <td>47</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>190</td>\n",
" <td>0</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>596</td>\n",
" <td>71.2833</td>\n",
" <td>81</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>353</td>\n",
" <td>0</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>669</td>\n",
" <td>7.9250</td>\n",
" <td>47</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>272</td>\n",
" <td>0</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>49</td>\n",
" <td>53.1000</td>\n",
" <td>55</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>472</td>\n",
" <td>8.0500</td>\n",
" <td>47</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>886</th>\n",
" <td>887</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>548</td>\n",
" <td>1</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>101</td>\n",
" <td>13.0000</td>\n",
" <td>47</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>887</th>\n",
" <td>888</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>303</td>\n",
" <td>0</td>\n",
" <td>19.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" <td>30.0000</td>\n",
" <td>30</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>888</th>\n",
" <td>889</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>413</td>\n",
" <td>0</td>\n",
" <td>28.0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>675</td>\n",
" <td>23.4500</td>\n",
" <td>47</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>889</th>\n",
" <td>890</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>81</td>\n",
" <td>1</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>30.0000</td>\n",
" <td>60</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>891</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>32.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>466</td>\n",
" <td>7.7500</td>\n",
" <td>47</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>891 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \\\n",
"0 1 0 3 108 1 22.0 1 0 523 \n",
"1 2 1 1 190 0 38.0 1 0 596 \n",
"2 3 1 3 353 0 26.0 0 0 669 \n",
"3 4 1 1 272 0 35.0 1 0 49 \n",
"4 5 0 3 15 1 35.0 0 0 472 \n",
".. ... ... ... ... ... ... ... ... ... \n",
"886 887 0 2 548 1 27.0 0 0 101 \n",
"887 888 1 1 303 0 19.0 0 0 14 \n",
"888 889 0 3 413 0 28.0 1 2 675 \n",
"889 890 1 1 81 1 26.0 0 0 8 \n",
"890 891 0 3 220 1 32.0 0 0 466 \n",
"\n",
" Fare Cabin Embarked \n",
"0 7.2500 47 2 \n",
"1 71.2833 81 0 \n",
"2 7.9250 47 2 \n",
"3 53.1000 55 2 \n",
"4 8.0500 47 2 \n",
".. ... ... ... \n",
"886 13.0000 47 2 \n",
"887 30.0000 30 2 \n",
"888 23.4500 47 2 \n",
"889 30.0000 60 0 \n",
"890 7.7500 47 1 \n",
"\n",
"[891 rows x 12 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_clean = autoclean(df, copy=True)\n",
"df_clean"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/), A. Sharma, 2018.\n",
"* [Handy Python Libraries for Formatting and Cleaning Data](https://mode.com/blog/python-data-cleaning-libraries), M. Bierly, 2016\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,578 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "849ad57e-6adb-4c2e-afd6-73db37eef572",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"id": "179cc802-9f1d-40b0-bf0c-9d4fb7ea1262",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"id": "9858d815-0390-4e77-a5ff-a8d2a1960981",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"id": "238bab60-75f0-4d29-ab05-66afc463b506",
"metadata": {},
"source": [
"# Autoclean\n",
"A simple library to clean data. [Autoclean](https://github.com/elisemercury/AutoClean) supports:\n",
"AutoClean supports:\n",
"\n",
"* Handling of duplicates\n",
"* Various imputation methods for missing values\n",
"* Handling of outliers\n",
"* Encoding of categorical data (OneHot, Label)\n",
"* Extraction of data time values\n",
"\n",
"Install the package: **pip install py-AutoClean**.\n",
"\n",
"Parameters:\n",
"\n",
"* **duplicates**\n",
" * default: False,\n",
" * other values: 'auto', True\n",
"* **missing_num**\n",
" * default:False,\n",
" * other values:\t'auto', 'linreg', 'knn', 'mean', 'median', 'most_frequent', 'delete', False\n",
"* **missing_categ**\n",
" * default: False,\n",
" * other values:\t'auto', 'logreg', 'knn', 'most_frequent', 'delete', False\n",
"* **encode_categ**\n",
" * default: False,\n",
" * other values:\t'auto', ['onehot'], ['label'], False ; to encode only specific columns add a list of column names or indexes: ['auto', ['col1', 2]]\n",
"* **extract_datetime**\n",
" * default:\tFalse,\n",
" * other values:\t'auto', 'D', 'M', 'Y', 'h', 'm', 's'\n",
"* **outliers**\n",
" * default:\tFalse,\n",
" * other values:\t'auto', 'winz', 'delete'\n",
"* **outlier_param**\tdefault:\t1.5, other values:\tany int or float, False\n",
"* **logfile**\n",
" * default: True,\n",
" * other values:\tFalse\n",
"* **verbose**\n",
" * default: False,\n",
" * other values:\tTrue"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "491b034b-994e-4f06-b4bc-df0590a62aab",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>886</th>\n",
" <td>887</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>Montvila, Rev. Juozas</td>\n",
" <td>male</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>211536</td>\n",
" <td>13.0000</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>887</th>\n",
" <td>888</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Graham, Miss. Margaret Edith</td>\n",
" <td>female</td>\n",
" <td>19.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>112053</td>\n",
" <td>30.0000</td>\n",
" <td>B42</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>888</th>\n",
" <td>889</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Johnston, Miss. Catherine Helen \"Carrie\"</td>\n",
" <td>female</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>W./C. 6607</td>\n",
" <td>23.4500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>889</th>\n",
" <td>890</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Behr, Mr. Karl Howell</td>\n",
" <td>male</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>111369</td>\n",
" <td>30.0000</td>\n",
" <td>C148</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>891</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Dooley, Mr. Patrick</td>\n",
" <td>male</td>\n",
" <td>32.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>370376</td>\n",
" <td>7.7500</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>891 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
".. ... ... ... \n",
"886 887 0 2 \n",
"887 888 1 1 \n",
"888 889 0 3 \n",
"889 890 1 1 \n",
"890 891 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
".. ... ... ... ... \n",
"886 Montvila, Rev. Juozas male 27.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"889 Behr, Mr. Karl Howell male 26.0 0 \n",
"890 Dooley, Mr. Patrick male 32.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
".. ... ... ... ... ... \n",
"886 0 211536 13.0000 NaN S \n",
"887 0 112053 30.0000 B42 S \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"889 0 111369 30.0000 C148 C \n",
"890 0 370376 7.7500 NaN Q \n",
"\n",
"[891 rows x 12 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"from AutoClean import AutoClean\n",
"\n",
"df = pd.read_csv('https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv')\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "d842eedf-3971-4966-a8b4-543bb56dd60d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AutoClean process completed in 0.289385 seconds\n",
"Logfile saved to: /home/cif/GoogleDrive/cursos/summer-school-romania/2019/notebooks/preprocessing/autoclean.log\n"
]
}
],
"source": [
"autoclean = AutoClean(df, mode='auto')\n",
"\n",
"# We can control the preprocessing\n",
"#autoclean = AutoClean(df, mode='auto', duplicates=False, missing_num=False, missing_categ=False, encode_categ=False, extract_datetime=False, outliers=False, outlier_param=1.5, logfile=True, verbose=False)\n"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "4ede7c55-475a-4748-8cc4-788f46c88b26",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" <th>Sex_female</th>\n",
" <th>Sex_male</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>C128</td>\n",
" <td>S</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>65.6344</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>C128</td>\n",
" <td>S</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>C128</td>\n",
" <td>S</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked Sex_female Sex_male \\\n",
"0 0 A/5 21171 7.2500 C128 S False True \n",
"1 0 PC 17599 65.6344 C85 C True False \n",
"2 0 STON/O2. 3101282 7.9250 C128 S True False \n",
"3 0 113803 53.1000 C123 S True False \n",
"4 0 373450 8.0500 C128 S False True \n",
"\n",
" Embarked_C Embarked_Q Embarked_S \n",
"0 False False True \n",
"1 True False False \n",
"2 False False True \n",
"3 False False True \n",
"4 False False True "
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_clean = autoclean.output\n",
"df_clean[0:5]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,502 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Duplicated values\n",
"\n",
"There are two possible approaches: **remove** these rows or **filling** them. It depends on every case.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Filling NaN values\n",
"If we need to fill errors or blanks, we can use the methods **fillna()** or **dropna()**.\n",
"\n",
"* For **string** fields, we can fill NaN with **' '**.\n",
"\n",
"* For **numbers**, we can fill with the **mean** or **median** value. \n"
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"# Fill NaN with ' '\n",
"df['col'] = df['col'].fillna(' ')\n",
"# Fill NaN with 99\n",
"df['col'] = df['col'].fillna(99)\n",
"# Fill NaN with the mean of the column\n",
"df['col'] = df['col'].fillna(df['col'].mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Propagate non-null values forward or backwards\n",
"You can also propagate non-null values forward or backwards by putting\n",
"method=pad as the method argument. It will fill the next value in the\n",
"dataframe with the previous non-NaN value. Maybe you just want to fill one\n",
"value ( limit=1 )or you want to fill all the values."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"df = pd.DataFrame(data={'col1':[np.nan, np.nan, 2,3,4, np.nan, np.nan]})"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 NaN\n",
"1 NaN\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 NaN\n",
"6 NaN"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 NaN\n",
"1 NaN\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 4.0\n",
"6 NaN"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We fill forward the value 4.0 and fill the next one (limit = 1)\n",
"df.fillna(method='pad', limit=1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can also backfilling with **bfill**."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1\n",
"0 2.0\n",
"1 2.0\n",
"2 2.0\n",
"3 3.0\n",
"4 4.0\n",
"5 NaN\n",
"6 NaN"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fill the first two NaN values with the first available value\n",
"df.fillna(method='bfill')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Removing NaN values\n",
"We can remove them by row or column."
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"/# Drop any rows which have any nans\n",
"df.dropna()\n",
"/# Drop columns that have any nans\n",
"df.dropna(axis=1)\n",
"/# Only drop columns which have at least 90% non-NaNs\n",
"df.dropna(thresh=int(df.shape[0] * .9), axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -0,0 +1,619 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# String Data\n",
"It is common to clean string columns so that they follow a predefined format (e.g. emails, URLs, ...).\n",
"\n",
"We can do it using regular expressions or specific libraries."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Beautifier\n",
"Simple [library](https://github.com/labtocat/beautifier) to cleanup and prettify url patterns, domains and so on. Library helps to clean unicodes, special characters and unnecessary redirection patterns from the urls and gives you clean date.\n",
"\n",
"Install with **'pip install beautifier'**."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Email cleanup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from beautifier import Email\n",
"email = Email('me@imsach.in')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'imsach.in'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.domain"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'me'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.username"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email.is_free_email"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"email2 = Email('This my address')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email2.is_valid"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"email3 = Email('pepe@gmail.com')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email3.is_valid"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"email3.is_free_email"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## URL cleanup"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from beautifier import Url\n",
"url = Url('https://in.linkedin.com/in/sachinphilip?authtoken=887nasdadasd6hasdtg21&secret=98jy766yhhuhnjk')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'https://in.linkedin.com/in/sachinphilip'"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.cleanup"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'in.linkedin.com'"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.domain"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['authtoken=887nasdadasd6hasdtg21', 'secret=98jy766yhhuhnjk']"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.param"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'authtoken=887nasdadasd6hasdtg21&secret=98jy766yhhuhnjk'"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.parameters"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'sachinphilip'"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url.username"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Unicode\n",
"Problem: Some unicode code has been broken. We see the character in a different character dataset.\n",
"\n",
"A **mojibake** is a character displayed in an unintended character enconding. Example: \"<22>\").\n",
"\n",
"We will use the library **ftfy** (fixed text for you) to fix it.\n",
"\n",
"First, you should install the library: ***conda install ftfy**. "
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"¯\\_(ツ)_/¯\n",
"Party\n",
"I'm\n"
]
}
],
"source": [
"import ftfy\n",
"foo = '&macr;\\\\_(ã\\x83\\x84)_/&macr;'\n",
"bar = '\\ufeffParty'\n",
"baz = '\\001\\033[36;44mI&#x92;m'\n",
"print(ftfy.fix_text(foo))\n",
"print(ftfy.fix_text(bar))\n",
"print(ftfy.fix_text(baz))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can understand which heuristics ftfy is using."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'ftfy' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-4030b963ff0a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mftfy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexplain_unicode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfoo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'ftfy' is not defined"
]
}
],
"source": [
"ftfy.explain_unicode(foo)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Dates\n",
"Sometimes we want to extract date from text. We can use regular expressions or handy packages, such as **python-dateutil**.\n",
"\n",
"Install the library: **pip install python-dateutil**."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-22 10:22:46+00:00\n"
]
}
],
"source": [
"from dateutil.parser import parse\n",
"now = parse(\"Thu Aug 22 10:22:46 UTC 2019\")\n",
"print(now)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-22 10:20:00\n"
]
}
],
"source": [
"dt = parse(\"Today is Thursday 8, 2019 at 10:20:00AM\", fuzzy=True)\n",
"print(dt)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)\n",
"* Beautifier https://github.com/labtocat/beautifier\n",
"* Ftfy https://ftfy.readthedocs.io/en/latest/\n",
"* python-dateutil https://dateutil.readthedocs.io/en/stable/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,185 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction to Visualization\n",
" \n",
"In this session, we will get more insight regarding how to visualize data.\n",
"\n",
"# Objectives\n",
"\n",
"The main objectives of this session are:\n",
"* Understanding how to visualize data\n",
"* Understanding the purpose of different charts \n",
"* Experimenting with several environments for visualizing data\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Seaborn\n",
"\n",
"Seaborn is a library that visualizes data in Python. The main characteristics are:\n",
"\n",
"* A dataset-oriented API for examining relationships between multiple variables\n",
"* Specialized support for using categorical variables to show observations or aggregate statistics\n",
"* Options for visualizing univariate or bivariate distributions and for comparing them between subsets of data\n",
"* Automatic estimation and plotting of linear regression models for different kinds of dependent variables\n",
"* Convenient views of the overall structure of complex datasets\n",
"* High-level abstractions for structuring multi-plot grids that let you quickly build complex visualizations\n",
"* Concise control over matplotlib figure styling with several built-in themes\n",
"* Tools for choosing color palettes that faithfully reveal patterns in your data\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Install\n",
"Use:\n",
"\n",
"**conda install seaborn**\n",
"\n",
"or \n",
"\n",
"**pip install seaborn**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Table of Contents"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"1. [Home](00_Intro_Visualization.ipynb)\n",
"2. [Dataset](01_Dataset.ipynb)\n",
"3. [Comparison Charts](02_Comparison_Charts.ipynb)\n",
" 1. [More Comparison Charts](02_01_More_Comparison_Charts.ipynb)\n",
"4. [Distribution Charts](03_Distribution_Charts.ipynb)\n",
"5. [Hierarchical charts](04_Hierarchical_Charts.ipynb)\n",
"6. [Relational charts](05_Relational_Charts.ipynb)\n",
"7. [Spatial charts](06_Spatial_Charts.ipynb)\n",
"8. [Temporal charts](07_Temporal_Charts.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,363 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Visualization](00_Intro_Visualization.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Dataset\n",
"Seaborn includes several datasets. We can consult the available datasets and load them. \n",
"\n",
"The datasets are also available at https://github.com/mwaskom/seaborn-data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from matplotlib import pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['anagrams',\n",
" 'anscombe',\n",
" 'attention',\n",
" 'brain_networks',\n",
" 'car_crashes',\n",
" 'diamonds',\n",
" 'dots',\n",
" 'dowjones',\n",
" 'exercise',\n",
" 'flights',\n",
" 'fmri',\n",
" 'geyser',\n",
" 'glue',\n",
" 'healthexp',\n",
" 'iris',\n",
" 'mpg',\n",
" 'penguins',\n",
" 'planets',\n",
" 'seaice',\n",
" 'taxis',\n",
" 'tips',\n",
" 'titanic']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sns.get_dataset_names()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>25.29</td>\n",
" <td>4.71</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>8.77</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>26.88</td>\n",
" <td>3.12</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>15.04</td>\n",
" <td>1.96</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>14.78</td>\n",
" <td>3.23</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size\n",
"0 16.99 1.01 Female No Sun Dinner 2\n",
"1 10.34 1.66 Male No Sun Dinner 3\n",
"2 21.01 3.50 Male No Sun Dinner 3\n",
"3 23.68 3.31 Male No Sun Dinner 2\n",
"4 24.59 3.61 Female No Sun Dinner 4\n",
"5 25.29 4.71 Male No Sun Dinner 4\n",
"6 8.77 2.00 Male No Sun Dinner 2\n",
"7 26.88 3.12 Male No Sun Dinner 4\n",
"8 15.04 1.96 Male No Sun Dinner 2\n",
"9 14.78 3.23 Male No Sun Dinner 2"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = sns.load_dataset('tips')\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Seaborn](http://seaborn.pydata.org/index.html) documentation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -27,14 +27,14 @@
"source": [
"# Introduction to Neural Networks\n",
" \n",
"In this lab session, we are going to learn how to train a neural network.\n",
"In this lab session, we will learn how to train a neural network.\n",
"\n",
"# Objectives\n",
"\n",
"The main objectives of this session are:\n",
"* Put in practice the notions learn in class about neural computing\n",
"* Understand what an MLP is\n",
"* Learn to use some libraries, such as scikit-learn "
"* Learn to use some libraries, such as Scikit-learn."
]
},
{
@@ -58,7 +58,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -39,7 +39,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Multilayer perceptrons, also called feedforward neural networks or deep feedforward networks, are the most basic deep learning models."
"Multilayer perceptrons, called feedforward neural networks or deep feedforward networks, are the most basic deep learning models."
]
},
{
@@ -58,7 +58,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we are going to try the spiral dataset with different algorthms. In particular, we are going to focus our attention on the MLP classifier.\n",
"In this notebook, we will try the spiral dataset with different algorithms. In particular, we are going to focus our attention on the MLP classifier.\n",
"\n",
"\n",
"Answer directly in your copy of the exercise and submit it as a moodle task."

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@@ -39,10 +39,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we are going to apply a MLP to a simple regression task: learning the Fresnel functions.\n",
"In this notebook, we are going to apply an MLP to a simple regression task: learning the Fresnel functions.\n",
"\n",
"\n",
"Answer directly in your copy of the exercise and submit it as a moodle task."
"Answer directly in your copy of the exercise and submit it as a Moodle task."
]
},
{
@@ -92,7 +92,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Change this variables to change the train and test dataset."
"Change these variables to change the train and test dataset."
]
},
{

View File

@@ -15,7 +15,7 @@ def gen_spiral_dataset(n_examples=500, n_classes=2, a=None, b=None, pi_space=3):
theta = np.linspace(0,pi_space*pi, num=n_examples)
xy = np.zeros((n_examples,2))
# logaritmic spirals
# logarithmic spirals
x_golden_parametric = lambda a, b, theta: a**(theta*b) * cos(theta)
y_golden_parametric = lambda a, b, theta: a**(theta*b) * sin(theta)
x_golden_parametric = np.vectorize(x_golden_parametric)

View File

@@ -48,7 +48,7 @@
"# Introduction\n",
"The purpose of this practice is to understand better how GAs work. \n",
"\n",
"There are many libraries that implement GAs, you can find some of then in the [References](#References) section."
"There are many libraries that implement GAs; you can find some of them in the [References](#References) section."
]
},
{
@@ -56,7 +56,7 @@
"metadata": {},
"source": [
"# Genetic Algorithms\n",
"In this section we are going to use the library DEAP [[References](#References)] for implementing a genetic algorithms.\n",
"In this section, we are going to use the library [DEAP](https://github.com/DEAP/deap/tree/master) for implementing a genetic algorithms.\n",
"\n",
"We are going to implement the OneMax problem as seen in class.\n",
"\n",
@@ -187,9 +187,9 @@
"metadata": {},
"source": [
"## Comparing\n",
"Your task is modify the previous code to canonical GA configuration from Holland (look at the lesson's slides). In addition you should consult the [DEAP API](http://deap.readthedocs.io/en/master/api/tools.html#operators).\n",
"Your task is to modify the previous code to canonical GA configuration from Holland (look at the lesson's slides). In addition you should consult the [DEAP API](http://deap.readthedocs.io/en/master/api/tools.html#operators).\n",
"\n",
"Submit your notebook and include a the modified code, and a comparison of the effects of these changes. \n",
"Submit your notebook and include a modified code and a comparison of the effects of these changes. \n",
"\n",
"Discuss your findings."
]
@@ -198,33 +198,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optimizing ML hyperparameters\n",
"## Optional. Optimizing ML hyperparameters\n",
"\n",
"One of the applications of Genetic Algorithms is the optimization of ML hyperparameters. Previously we have used GridSearch from Scikit. Using (sklearn-deap)[[References](#References)], optimize the Titatic hyperparameters using both GridSearch and Genetic Algorithms. \n",
"One of the applications of Genetic Algorithms is the optimization of ML hyperparameters. Previously, we have used GridSearch from Scikit. Using [sklearn-deap](https://github.com/rsteca/sklearn-deap), optimize the Titatic hyperparameters using both GridSearch and Genetic Algorithms. \n",
"\n",
"The same exercise (using the digits dataset) can be found in this [notebook](https://github.com/rsteca/sklearn-deap/blob/master/test.ipynb).\n",
"\n",
"Submit a notebook where you include well-crafted conclusions about the exercises, discussing the pros and cons of using genetic algorithms for this purpose.\n",
"Since there is a problem with Scikit version 0.24, you can just comment on the different approaches.",
"\n",
"Note: There is a problem with the version 0.24 of scikit. Just comment the different approaches."
"Alternatively, you can also use the library [sklearn-genetic-opt](https://sklearn-genetic-opt.readthedocs.io/en/stable/index.html) and discuss the digit classification example included in the library: [digits decision tree](https://sklearn-genetic-opt.readthedocs.io/en/stable/notebooks/Digits_decision_tree.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional exercises\n",
"## Optional. Optimizing an ML pipeline with a genetic algorithm\n",
"\n",
"Here there is a proposed optional exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optimizing a ML pipeline with a genetic algorithm\n",
"\n",
"The library [TPOT](#References) optimizes ML pipelines and comes with a lot of (examples)[https://epistasislab.github.io/tpot/examples/] and even notebooks, for example for the [iris dataset](https://github.com/EpistasisLab/tpot/blob/master/tutorials/IRIS.ipynb).\n",
"The library [TPOT](https://epistasislab.github.io/tpot/latest/) optimizes ML pipelines and comes with a lot of [examples](https://epistasislab.github.io/tpot/latest/Tutorial/9_Genetic_Algorithm_Overview/) and even notebooks, for example for the [iris dataset](https://github.com/EpistasisLab/tpot/blob/master/tutorials/IRIS.ipynb).\n",
"\n",
"Your task is to apply TPOT to the intermediate challenge and write a short essay explaining:\n",
"* what TPOT does (with your own words).\n",
@@ -242,7 +233,8 @@
"* [tpot](http://epistasislab.github.io/tpot/)\n",
"* [gplearn](http://gplearn.readthedocs.io/en/latest/index.html)\n",
"* [scikit-allel](https://scikit-allel.readthedocs.io/en/latest/)\n",
"* [scklearn-genetic](https://github.com/manuel-calzolari/sklearn-genetic)"
"* [sklearn-genetic](https://github.com/manuel-calzolari/sklearn-genetic)\n",
"* [sklearn-genetic-opt](https://sklearn-genetic-opt.readthedocs.io/en/stable/)"
]
},
{
@@ -256,7 +248,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]

View File

@@ -48,7 +48,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. [Q-Learning](2_6_1_Q-Learning.ipynb)"
"1. [Q-Learning](2_6_1_Q-Learning_Basic.ipynb)\n",
"1. [Visualization](2_6_1_Q-Learning_Visualization.ipynb)\n",
"1. [Exercises](2_6_1_Q-Learning_Exercises.ipynb)"
]
},
{
@@ -64,7 +66,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -78,7 +80,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
"version": "3.10.10"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,

View File

@@ -1,455 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2018 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning V](2_6_0_Intro_RL.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"\n",
"* [Introduction](#Introduction)\n",
"* [Getting started with OpenAI Gym](#Getting-started-with-OpenAI-Gym)\n",
"* [The Frozen Lake scenario](#The-Frozen-Lake-scenario)\n",
"* [Q-Learning with the Frozen Lake scenario](#Q-Learning-with-the-Frozen-Lake-scenario)\n",
"* [Exercises](#Exercises)\n",
"* [Optional exercises](#Optional-exercises)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"The purpose of this practice is to understand better Reinforcement Learning (RL) and, in particular, Q-Learning.\n",
"\n",
"We are going to use [OpenAI Gym](https://gym.openai.com/). OpenAI is a toolkit for developing and comparing RL algorithms.Take a loot at ther [website](https://gym.openai.com/).\n",
"\n",
"It implements [algorithm imitation](http://gym.openai.com/envs/#algorithmic), [classic control problems](http://gym.openai.com/envs/#classic_control), [Atari games](http://gym.openai.com/envs/#atari), [Box2D continuous control](http://gym.openai.com/envs/#box2d), [robotics with MuJoCo, Multi-Joint dynamics with Contact](http://gym.openai.com/envs/#mujoco), and [simple text based environments](http://gym.openai.com/envs/#toy_text).\n",
"\n",
"This notebook is based on * [Diving deeper into Reinforcement Learning with Q-Learning](https://medium.freecodecamp.org/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe).\n",
"\n",
"First of all, install the OpenAI Gym library:\n",
"\n",
"```console\n",
"foo@bar:~$ pip install gym\n",
"```\n",
"\n",
"\n",
"If you get the error message 'NotImplementedError: abstract', [execute](https://github.com/openai/gym/issues/775) \n",
"```console\n",
"foo@bar:~$ pip install pyglet==1.2.4\n",
"```\n",
"\n",
"If you want to try the Atari environment, it is better that you opt for the full installation from the source. Follow the instructions at [https://github.com/openai/gym#id15](OpenGym).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting started with OpenAI Gym\n",
"\n",
"First of all, read the [introduction](http://gym.openai.com/docs/#getting-started-with-gym) of OpenAI Gym."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environments\n",
"OpenGym provides a number of problems called *environments*. \n",
"\n",
"Try the 'CartPole-v0' (or 'MountainCar)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import gym\n",
"\n",
"env = gym.make(\"CartPole-v1\")\n",
"#env = gym.make('MountainCar-v0')\n",
"#env = gym.make('Taxi-v2')\n",
"\n",
"observation = env.reset()\n",
"for _ in range(1000):\n",
" env.render()\n",
" action = env.action_space.sample() # your agent here (this takes random actions)\n",
" observation, reward, done, info = env.step(action)\n",
"\n",
" if done:\n",
" observation = env.reset()\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This will launch an external window with the game. If you cannot close that window, just execute in a code cell:\n",
"\n",
"```python\n",
"env.close()\n",
"```\n",
"\n",
"The full list of available environments can be found printing the environment registry as follows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from gym import envs\n",
"print(envs.registry.all())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The environments **step** function returns four values. These are:\n",
"\n",
"* **observation (object):** an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.\n",
"* **reward (float):** amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.\n",
"* **done (boolean):** whether its time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.).\n",
"* **info (dict):** diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environments last state change). However, official evaluations of your agent are not allowed to use this for learning.\n",
"\n",
"The typical agent loop consists in first calling the method *reset* which provides an initial observation. Then the agent executes an action, and receives the reward, the new observation, and if the episode has finished (done is true). \n",
"\n",
"For example, analyze this sample of agent loop for 100 ms. The details of the previous variables for this game as described [here](https://github.com/openai/gym/wiki/CartPole-v0) are:\n",
"* **observation**: Cart Position, Cart Velocity, Pole Angle, Pole Velocity.\n",
"* **action**: 0\t(Push cart to the left), 1\t(Push cart to the right).\n",
"* **reward**: 1 for every step taken, including the termination step."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import gym\n",
"env = gym.make('CartPole-v0')\n",
"for i_episode in range(20):\n",
" observation = env.reset()\n",
" for t in range(100):\n",
" env.render()\n",
" print(observation)\n",
" action = env.action_space.sample()\n",
" print(\"Action \", action)\n",
" observation, reward, done, info = env.step(action)\n",
" print(\"Observation \", observation, \", reward \", reward, \", done \", done, \", info \" , info)\n",
" if done:\n",
" print(\"Episode finished after {} timesteps\".format(t+1))\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Frozen Lake scenario\n",
"We are going to play to the [Frozen Lake](http://gym.openai.com/envs/FrozenLake-v0/) game.\n",
"\n",
"The problem is a grid where you should go from the 'start' (S) position to the 'goal position (G) (the pizza!). You can only walk through the 'frozen tiles' (F). Unfortunately, you can fall in a 'hole' (H).\n",
"![](images/frozenlake-problem.png \"Frozen lake problem\")\n",
"\n",
"The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise. The possible actions are going left, right, up or down. However, the ice is slippery, so you won't always move in the direction you intend.\n",
"\n",
"![](images/frozenlake-world.png \"Frozen lake world\")\n",
"\n",
"\n",
"Here you can see several episodes. A full recording is available at [Frozen World](http://gym.openai.com/envs/FrozenLake-v0/).\n",
"\n",
"![](images/recording.gif \"Example running\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Q-Learning with the Frozen Lake scenario\n",
"We are now going to apply Q-Learning for the Frozen Lake scenario. This part of the notebook is taken from [here](https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Q%20learning/Q%20Learning%20with%20FrozenLake.ipynb).\n",
"\n",
"First we create the environment and a Q-table inizializated with zeros to store the value of each action in a given state. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import gym\n",
"import random\n",
"\n",
"env = gym.make(\"FrozenLake-v0\")\n",
"\n",
"\n",
"action_size = env.action_space.n\n",
"state_size = env.observation_space.n\n",
"\n",
"\n",
"qtable = np.zeros((state_size, action_size))\n",
"print(qtable)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we define the hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Q-Learning hyperparameters\n",
"total_episodes = 10000 # Total episodes\n",
"learning_rate = 0.8 # Learning rate\n",
"max_steps = 99 # Max steps per episode\n",
"gamma = 0.95 # Discounting rate\n",
"\n",
"# Exploration hyperparameters\n",
"epsilon = 1.0 # Exploration rate\n",
"max_epsilon = 1.0 # Exploration probability at start\n",
"min_epsilon = 0.01 # Minimum exploration probability \n",
"decay_rate = 0.01 # Exponential decay rate for exploration prob"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we implement the Q-Learning algorithm.\n",
"\n",
"![](images/qlearning-algo.png \"Q-Learning algorithm\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# List of rewards\n",
"rewards = []\n",
"\n",
"# 2 For life or until learning is stopped\n",
"for episode in range(total_episodes):\n",
" # Reset the environment\n",
" state = env.reset()\n",
" step = 0\n",
" done = False\n",
" total_rewards = 0\n",
" \n",
" for step in range(max_steps):\n",
" # 3. Choose an action a in the current world state (s)\n",
" ## First we randomize a number\n",
" exp_exp_tradeoff = random.uniform(0, 1)\n",
" \n",
" ## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)\n",
" if exp_exp_tradeoff > epsilon:\n",
" action = np.argmax(qtable[state,:])\n",
"\n",
" # Else doing a random choice --> exploration\n",
" else:\n",
" action = env.action_space.sample()\n",
"\n",
" # Take the action (a) and observe the outcome state(s') and reward (r)\n",
" new_state, reward, done, info = env.step(action)\n",
"\n",
" # Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]\n",
" # qtable[new_state,:] : all the actions we can take from new state\n",
" qtable[state, action] = qtable[state, action] + learning_rate * (reward + gamma * np.max(qtable[new_state, :]) - qtable[state, action])\n",
" \n",
" total_rewards += reward\n",
" \n",
" # Our new state is state\n",
" state = new_state\n",
" \n",
" # If done (if we're dead) : finish episode\n",
" if done == True: \n",
" break\n",
" \n",
" episode += 1\n",
" # Reduce epsilon (because we need less and less exploration)\n",
" epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode) \n",
" rewards.append(total_rewards)\n",
"\n",
"print (\"Score over time: \" + str(sum(rewards)/total_episodes))\n",
"print(qtable)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we use the learnt Q-table for playing the Frozen World game."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"env.reset()\n",
"\n",
"for episode in range(5):\n",
" state = env.reset()\n",
" step = 0\n",
" done = False\n",
" print(\"****************************************************\")\n",
" print(\"EPISODE \", episode)\n",
"\n",
" for step in range(max_steps):\n",
" env.render()\n",
" # Take the action (index) that have the maximum expected future reward given that state\n",
" action = np.argmax(qtable[state,:])\n",
" \n",
" new_state, reward, done, info = env.step(action)\n",
" \n",
" if done:\n",
" break\n",
" state = new_state\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"\n",
"## Taxi\n",
"Analyze the [Taxi problem](http://gym.openai.com/envs/Taxi-v2/) and solve it applying Q-Learning. You can find a solution as the one previously presented [here](https://www.oreilly.com/learning/introduction-to-reinforcement-learning-and-openai-gym).\n",
"\n",
"Analyze the impact of not changing the learning rate (alfa or epsilon, depending on the book) or changing it in a different way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional exercises\n",
"\n",
"## Doom\n",
"Read this [article](https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8) and execute the companion [notebook](https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Deep%20Q%20Learning/Doom/Deep%20Q%20learning%20with%20Doom.ipynb). Analyze the results and provide conclusions about DQN."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"* [Diving deeper into Reinforcement Learning with Q-Learning, Thomas Simonini](https://medium.freecodecamp.org/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe).\n",
"* Illustrations by [Thomas Simonini](https://github.com/simoninithomas/Deep_reinforcement_learning_Course) and [Sung Kim](https://www.youtube.com/watch?v=xgoO54qN4lY).\n",
"* [Frozen Lake solution with TensorFlow](https://analyticsindiamag.com/openai-gym-frozen-lake-beginners-guide-reinforcement-learning/)\n",
"* [Deep Q-Learning for Doom](https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8)\n",
"* [Intro OpenAI Gym with Random Search and the Cart Pole scenario](http://www.pinchofintelligence.com/getting-started-openai-gym/)\n",
"* [Q-Learning for the Taxi scenario](https://www.oreilly.com/learning/introduction-to-reinforcement-learning-and-openai-gym)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2018 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos Á. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning V](2_6_0_Intro_RL.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"\n",
"\n",
"## Taxi\n",
"Analyze the [Taxi problem](https://gymnasium.farama.org/environments/toy_text/taxi/) and solve it applying Q-Learning. You can find a solution as the one previously presented [here](https://www.oreilly.com/learning/introduction-to-reinforcement-learning-and-openai-gym), and the notebook is [here](https://github.com/wagonhelm/Reinforcement-Learning-Introduction/blob/master/Reinforcement%20Learning%20Introduction.ipynb). Take into account that Gymnasium has changed, so you will have to adapt the code.\n",
"\n",
"Analyze the impact of not changing the learning rate or changing it in a different way. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional exercises\n",
"Select one of the following exercises.\n",
"\n",
"## Blackjack\n",
"Analyze how to appy Q-Learning for solving Blackjack.\n",
"You can find information in this [article](https://gymnasium.farama.org/tutorials/training_agents/blackjack_tutorial/).\n",
"\n",
"## Doom\n",
"Read this [article](https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8) and execute the companion [notebook](https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Deep%20Q%20Learning/Doom/Deep%20Q%20learning%20with%20Doom.ipynb). Analyze the results and provide conclusions about DQN.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"* [Gymnasium documentation](https://gymnasium.farama.org/).\n",
"* [Diving deeper into Reinforcement Learning with Q-Learning, Thomas Simonini](https://medium.freecodecamp.org/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe).\n",
"* Illustrations by [Thomas Simonini](https://github.com/simoninithomas/Deep_reinforcement_learning_Course) and [Sung Kim](https://www.youtube.com/watch?v=xgoO54qN4lY).\n",
"* [Frozen Lake solution with TensorFlow](https://analyticsindiamag.com/openai-gym-frozen-lake-beginners-guide-reinforcement-learning/)\n",
"* [Deep Q-Learning for Doom](https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8)\n",
"* [Intro OpenAI Gym with Random Search and the Cart Pole scenario](http://www.pinchofintelligence.com/getting-started-openai-gym/)\n",
"* [Q-Learning for the Taxi scenario](https://www.oreilly.com/learning/introduction-to-reinforcement-learning-and-openai-gym)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos Á. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because one or more lines are too long

274
ml5/qlearning.py Normal file
View File

@@ -0,0 +1,274 @@
# Class definition of QLearning
from pathlib import Path
from typing import NamedTuple
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tqdm import tqdm
import gymnasium as gym
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
# Params
class Params(NamedTuple):
total_episodes: int # Total episodes
learning_rate: float # Learning rate
gamma: float # Discounting rate
epsilon: float # Exploration probability
map_size: int # Number of tiles of one side of the squared environment
seed: int # Define a seed so that we get reproducible results
is_slippery: bool # If true the player will move in intended direction with probability of 1/3 else will move in either perpendicular direction with equal probability of 1/3 in both directions
n_runs: int # Number of runs
action_size: int # Number of possible actions
state_size: int # Number of possible states
proba_frozen: float # Probability that a tile is frozen
savefig_folder: Path # Root folder where plots are saved
class Qlearning:
def __init__(self, learning_rate, gamma, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.gamma = gamma
self.reset_qtable()
def update(self, state, action, reward, new_state):
"""Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]"""
delta = (
reward
+ self.gamma * np.max(self.qtable[new_state][:])
- self.qtable[state][action]
)
q_update = self.qtable[state][action] + self.learning_rate * delta
return q_update
def reset_qtable(self):
"""Reset the Q-table."""
self.qtable = np.zeros((self.state_size, self.action_size))
class EpsilonGreedy:
def __init__(self, epsilon, rng):
self.epsilon = epsilon
self.rng = rng
def choose_action(self, action_space, state, qtable):
"""Choose an action `a` in the current world state (s)."""
# First we randomize a number
explor_exploit_tradeoff = self.rng.uniform(0, 1)
# Exploration
if explor_exploit_tradeoff < self.epsilon:
action = action_space.sample()
# Exploitation (taking the biggest Q-value for this state)
else:
# Break ties randomly
# If all actions are the same for this state we choose a random one
# (otherwise `np.argmax()` would always take the first one)
if np.all(qtable[state][:]) == qtable[state][0]:
action = action_space.sample()
else:
action = np.argmax(qtable[state][:])
return action
def run_frozen_maps(maps, params, rng):
"""Run FrozenLake in maps and plot results"""
map_sizes = maps
res_all = pd.DataFrame()
st_all = pd.DataFrame()
for map_size in map_sizes:
env = gym.make(
"FrozenLake-v1",
is_slippery=params.is_slippery,
render_mode="rgb_array",
desc=generate_random_map(
size=map_size, p=params.proba_frozen, seed=params.seed
),
)
params = params._replace(action_size=env.action_space.n)
params = params._replace(state_size=env.observation_space.n)
env.action_space.seed(
params.seed
) # Set the seed to get reproducible results when sampling the action space
learner = Qlearning(
learning_rate=params.learning_rate,
gamma=params.gamma,
state_size=params.state_size,
action_size=params.action_size,
)
explorer = EpsilonGreedy(
epsilon=params.epsilon,
rng=rng
)
print(f"Map size: {map_size}x{map_size}")
rewards, steps, episodes, qtables, all_states, all_actions = run_env(env, params, learner, explorer)
# Save the results in dataframes
res, st = postprocess(episodes, params, rewards, steps, map_size)
res_all = pd.concat([res_all, res])
st_all = pd.concat([st_all, st])
qtable = qtables.mean(axis=0) # Average the Q-table between runs
plot_states_actions_distribution(
states=all_states, actions=all_actions, map_size=map_size, params=params
) # Sanity check
plot_q_values_map(qtable, env, map_size, params)
env.close()
return res_all, st_all
def run_env(env, params, learner, explorer):
rewards = np.zeros((params.total_episodes, params.n_runs))
steps = np.zeros((params.total_episodes, params.n_runs))
episodes = np.arange(params.total_episodes)
qtables = np.zeros((params.n_runs, params.state_size, params.action_size))
all_states = []
all_actions = []
for run in range(params.n_runs): # Run several times to account for stochasticity
learner.reset_qtable() # Reset the Q-table between runs
for episode in tqdm(
episodes, desc=f"Run {run}/{params.n_runs} - Episodes", leave=False
):
state = env.reset(seed=params.seed)[0] # Reset the environment
step = 0
done = False
total_rewards = 0
while not done:
action = explorer.choose_action(
action_space=env.action_space, state=state, qtable=learner.qtable
)
# Log all states and actions
all_states.append(state)
all_actions.append(action)
# Take the action (a) and observe the outcome state(s') and reward (r)
new_state, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
learner.qtable[state, action] = learner.update(
state, action, reward, new_state
)
total_rewards += reward
step += 1
# Our new state is state
state = new_state
# Log all rewards and steps
rewards[episode, run] = total_rewards
steps[episode, run] = step
qtables[run, :, :] = learner.qtable
return rewards, steps, episodes, qtables, all_states, all_actions
def postprocess(episodes, params, rewards, steps, map_size):
"""Convert the results of the simulation in dataframes."""
res = pd.DataFrame(
data={
"Episodes": np.tile(episodes, reps=params.n_runs),
"Rewards": rewards.flatten(),
"Steps": steps.flatten(),
}
)
res["cum_rewards"] = rewards.cumsum(axis=0).flatten(order="F")
res["map_size"] = np.repeat(f"{map_size}x{map_size}", res.shape[0])
st = pd.DataFrame(data={"Episodes": episodes, "Steps": steps.mean(axis=1)})
st["map_size"] = np.repeat(f"{map_size}x{map_size}", st.shape[0])
return res, st
def qtable_directions_map(qtable, map_size):
"""Get the best learned action & map it to arrows."""
qtable_val_max = qtable.max(axis=1).reshape(map_size, map_size)
qtable_best_action = np.argmax(qtable, axis=1).reshape(map_size, map_size)
directions = {0: "", 1: "", 2: "", 3: ""}
qtable_directions = np.empty(qtable_best_action.flatten().shape, dtype=str)
eps = np.finfo(float).eps # Minimum float number on the machine
for idx, val in enumerate(qtable_best_action.flatten()):
if qtable_val_max.flatten()[idx] > eps:
# Assign an arrow only if a minimal Q-value has been learned as best action
# otherwise since 0 is a direction, it also gets mapped on the tiles where
# it didn't actually learn anything
qtable_directions[idx] = directions[val]
qtable_directions = qtable_directions.reshape(map_size, map_size)
return qtable_val_max, qtable_directions
def plot_q_values_map(qtable, env, map_size, params):
"""Plot the last frame of the simulation and the policy learned."""
qtable_val_max, qtable_directions = qtable_directions_map(qtable, map_size)
# Plot the last frame
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
ax[0].imshow(env.render())
ax[0].axis("off")
ax[0].set_title("Last frame")
# Plot the policy
sns.heatmap(
qtable_val_max,
annot=qtable_directions,
fmt="",
ax=ax[1],
cmap=sns.color_palette("Blues", as_cmap=True),
linewidths=0.7,
linecolor="black",
xticklabels=[],
yticklabels=[],
annot_kws={"fontsize": "xx-large"},
).set(title="Learned Q-values\nArrows represent best action")
for _, spine in ax[1].spines.items():
spine.set_visible(True)
spine.set_linewidth(0.7)
spine.set_color("black")
img_title = f"frozenlake_q_values_{map_size}x{map_size}.png"
fig.savefig(params.savefig_folder / img_title, bbox_inches="tight")
plt.show()
def plot_states_actions_distribution(states, actions, map_size, params):
"""Plot the distributions of states and actions."""
labels = {"LEFT": 0, "DOWN": 1, "RIGHT": 2, "UP": 3}
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
sns.histplot(data=states, ax=ax[0], kde=True)
ax[0].set_title("States")
sns.histplot(data=actions, ax=ax[1])
ax[1].set_xticks(list(labels.values()), labels=labels.keys())
ax[1].set_title("Actions")
fig.tight_layout()
img_title = f"frozenlake_states_actions_distrib_{map_size}x{map_size}.png"
fig.savefig(params.savefig_folder / img_title, bbox_inches="tight")
plt.show()
def plot_steps_and_rewards(rewards_df, steps_df,params):
"""Plot the steps and rewards from dataframes."""
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
sns.lineplot(
data=rewards_df, x="Episodes", y="cum_rewards", hue="map_size", ax=ax[0]
)
ax[0].set(ylabel="Cumulated rewards")
sns.lineplot(data=steps_df, x="Episodes", y="Steps", hue="map_size", ax=ax[1])
ax[1].set(ylabel="Averaged steps number")
for axi in ax:
axi.legend(title="map size")
fig.tight_layout()
img_title = "frozenlake_steps_and_rewards.png"
fig.savefig(params.savefig_folder / img_title, bbox_inches="tight")
plt.show()

742
nlp/0_1_LLM.ipynb Normal file

File diff suppressed because one or more lines are too long

View File

@@ -89,7 +89,7 @@
}
},
"source": [
"In this session we are going to learn to process text so that can apply machine learning techniques."
"In this session, we are going to learn to process text so that we can apply machine learning techniques."
]
},
{
@@ -101,7 +101,7 @@
},
"source": [
"# NLP Basics\n",
"In this notebook we are going to use two popular NLP libraries:\n",
"In this notebook, we are going to use two popular NLP libraries:\n",
"* NLTK (Natural Language Toolkit, https://www.nltk.org/) \n",
"* Spacy (https://spacy.io/)"
]
@@ -116,7 +116,7 @@
"source": [
"Main characteristics:\n",
"* both are open source and very popular\n",
"* NLTK was released in 2001 while Spacy was in 2015\n",
"* NLTK was released in 2001, while Spacy was in 2015\n",
"* Spacy provides very efficient implementations"
]
},
@@ -130,7 +130,7 @@
"source": [
"# Spacy installation\n",
"\n",
"You need to install previously spacy if not installed:\n",
"You need to install spacy if not installed:\n",
"* `pip install spacy`\n",
"* or `conda install -c conda-forge spacy`\n",
"\n",
@@ -148,7 +148,7 @@
"source": [
"# Spacy pipelines\n",
"\n",
"The function **nlp** takes a raw text and perform several operations (tokenization, tagger, NER, ...)\n",
"The function **nlp** takes a raw text and performs several operations (tokenization, tagger, NER, ...)\n",
"![](spacy/spacy-pipeline.svg \"Spacy pipelines\")"
]
},
@@ -160,7 +160,7 @@
}
},
"source": [
"From text to doc trough the pipeline"
"From text to doc through the pipeline"
]
},
{
@@ -205,7 +205,7 @@
"\n",
"* **Tokenizer exception:** Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied.\n",
"* **Prefix:** Character(s) at the beginning, e.g. $, (, “, ¿.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ), ”, !.\n",
"* **Suffix:** Character(s) at the end, e.g. km, ”, !.\n",
"* **Infix:** Character(s) in between, e.g. -, --, /, …."
]
},

View File

@@ -82,7 +82,7 @@
}
},
"source": [
"### 1. List the first 10 tokens of the doc"
"### 1. List the first 10 tokens of the doc."
]
},
{
@@ -149,7 +149,7 @@
}
},
"source": [
"### 7. Visualize the dependency grammar analysis of the second sentence"
"### 7. Visualize the dependency grammar analysis of the second sentence."
]
},
{
@@ -178,7 +178,7 @@
}
},
"source": [
"### 9. List frequencies of POS in the document in a table "
"### 9. List the frequencies of POS in the document in a table."
]
},
{
@@ -191,7 +191,7 @@
"source": [
"### 10. Preprocessing\n",
"\n",
"Remove from the doc stopwords, digits and punctuation.\n",
"Remove from the doc stopwords, digits, and punctuation.\n",
"\n",
"Hint: check the token api https://spacy.io/api/token\n",
"\n",
@@ -207,7 +207,7 @@
},
"source": [
"### 11. Entities of the document\n",
"Print the entities of the document, the type of the entity and what the explanation of the entity in a table with three columns.\n",
"Print the entities of the document, the type of the entity, and the explanation of the entity in a table with three columns.\n",
"\n",
"Example:\n",
"\n",
@@ -223,7 +223,7 @@
},
"source": [
"### 12. Visualize the entities\n",
"Show the entities in a graph."
"Show the entities highlighted in the text."
]
},
{
@@ -236,7 +236,7 @@
"source": [
"# Movie review\n",
"\n",
"Classify the rmoview reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
"Classify the movie reviews from the following dataset https://data.world/rajeevsharma993/movie-reviews"
]
},
{

View File

@@ -0,0 +1,33 @@
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1155" height="221" viewBox="0 0 1155 221">
<defs>
<rect id="a" width="735" height="170" x="210" y="25" rx="30"/>
<mask id="b" width="735" height="170" x="0" y="0" fill="#fff" maskContentUnits="userSpaceOnUse" maskUnits="objectBoundingBox">
<use xlink:href="#a"/>
</mask>
</defs>
<g fill="none" fill-rule="evenodd" transform="translate(0 26)">
<rect width="145" height="80" x="2.5" y="2.5" fill="#D8D8D8" stroke="#6A6A6A" stroke-width="5" rx="10" transform="translate(0 70)"/>
<path fill="#3D4251" fill-rule="nonzero" d="M55.4 99.7v3.9h-7.6V125H43v-21.4h-7.7v-3.9h20zm10.2 7c1 0 2.1.2 3 .6a6.8 6.8 0 014.1 4.1 9.6 9.6 0 01.6 4.3l-.2.5-.3.3H61.3c0 2 .6 3.3 1.4 4.1.9.9 2 1.3 3.5 1.3a6 6 0 001.8-.2l1.3-.6 1-.5.8-.3c.2 0 .3 0 .5.2l.3.2 1.3 1.6c-.5.6-1 1-1.6 1.4a9 9 0 01-3.9 1.4l-2 .2c-1.2 0-2.3-.2-3.4-.7-1-.4-2-1-2.8-1.8a8.6 8.6 0 01-1.9-3 11.6 11.6 0 010-7.6c.3-1.1.9-2 1.6-2.8a8 8 0 012.7-2 9 9 0 013.7-.6zm0 3.2a4 4 0 00-3 1c-.6.7-1 1.8-1.3 3h8.1c0-.5 0-1-.2-1.5-.1-.5-.4-1-.7-1.3-.3-.4-.7-.7-1.2-1a4 4 0 00-1.7-.2zm15.5 5.8l-5.9-8.7h4.2c.3 0 .5 0 .7.2l.4.4 3.7 6a4.9 4.9 0 01.6-1.2l3-4.7.4-.5.6-.2h4l-6 8.5L93 125h-4.2c-.3 0-.5 0-.7-.2l-.5-.6-3.8-6.3-.4 1.1-3.4 5.2-.5.5a1 1 0 01-.7.3H75l6-9.3zm20.5 9.6c-1.5 0-2.7-.5-3.5-1.3a5 5 0 01-1.3-3.7v-10H95c-.3 0-.5 0-.6-.2-.2-.2-.3-.4-.3-.7v-1.7l2.9-.5 1-5c0-.1 0-.3.2-.5l.7-.2h2.2v5.7h4.7v3h-4.7v9.8c0 .6.2 1 .4 1.3.3.3.7.5 1.2.5l.6-.1a3.7 3.7 0 00.9-.4l.3-.1.3.1.3.3 1.2 2c-.6.6-1.3 1-2.1 1.3a8 8 0 01-2.6.4z"/>
<rect width="145" height="80" x="2.5" y="2.5" fill="#D7CCF4" stroke="#8978B5" stroke-width="5" rx="10" transform="translate(1005 70)"/>
<path fill="#3D4251" fill-rule="nonzero" d="M1050.3 101.5a58.8 58.8 0 016.8-.4c2.2 0 4 .4 5.4 1 1.4.6 2.5 1.5 3.4 2.6a10 10 0 011.7 4 23.2 23.2 0 010 9.6c-.3 1.5-1 2.9-1.8 4-.8 1.3-2 2.2-3.5 3-1.5.7-3.4 1-5.8 1a37.3 37.3 0 01-5-.1l-1.2-.2v-24.5zm7 4a15.6 15.6 0 00-2.3 0V122h.5a158 158 0 001.6.1 6 6 0 003.2-.7c.8-.5 1.4-1.2 1.8-2 .4-.8.7-1.8.8-2.8a27.3 27.3 0 000-5.8 8 8 0 00-.7-2.6c-.4-.8-1-1.5-1.8-2-.7-.5-1.8-.8-3.1-.8zm13.4 11.8c0-1.5.2-2.8.7-4a8 8 0 014.8-4.7c1.1-.4 2.4-.6 3.8-.6 1.5 0 2.8.2 4 .7 1 .4 2 1 2.9 1.8.8.9 1.4 1.8 1.8 3 .4 1.1.6 2.4.6 3.7 0 1.5-.2 2.8-.7 4a8 8 0 01-4.8 4.7c-1.1.4-2.4.6-3.8.6a11 11 0 01-4-.7c-1-.4-2-1-2.9-1.8a7.9 7.9 0 01-1.8-3c-.4-1.1-.6-2.4-.6-3.8zm4.7 0c0 .7.1 1.4.3 2 .2.7.5 1.3 1 1.8a4.1 4.1 0 003.3 1.5c1.4 0 2.5-.4 3.3-1.3.9-.8 1.3-2.2 1.3-4a6 6 0 00-1.2-4c-.8-1-2-1.4-3.4-1.4-.7 0-1.3 0-1.8.3-.6.2-1 .5-1.5 1-.4.4-.7 1-1 1.6-.2.7-.3 1.5-.3 2.4zm34.2 7c-1 .7-2 1.3-3.3 1.6-1.3.4-2.7.6-4 .6-1.6 0-3-.2-4.1-.7-1.2-.4-2.2-1-3-1.8a8 8 0 01-1.8-3 10.9 10.9 0 010-7.7 8.2 8.2 0 015.2-4.7 14.3 14.3 0 017.6-.2l2.6 1v6.1h-3.8v-3.2l-2.2-.3c-.7 0-1.3.1-2 .3a4.8 4.8 0 00-2.9 2.6c-.3.7-.5 1.4-.5 2.3 0 .8.2 1.5.4 2.1a5 5 0 002.8 2.8 8.2 8.2 0 005.6-.2l1.9-1 1.5 3.4z"/>
<use stroke="#3AC" stroke-dasharray="5 10" stroke-width="10" mask="url(#b)" xlink:href="#a"/>
<g transform="translate(540)">
<rect width="95" height="50" x="2.5" y="2.5" fill="#C3E7F1" stroke="#3AC" stroke-width="5" rx="10"/>
<path fill="#3D4251" fill-rule="nonzero" d="M27.8 24.5h4.4l.3 1.6h.1a5.2 5.2 0 014.2-2c.7 0 1.3.1 1.8.3.6.2 1 .4 1.4.8.4.4.7 1 1 1.6.1.6.3 1.5.3 2.4V37H38v-7.1c0-1-.2-1.8-.7-2.2-.4-.5-1-.7-1.7-.7-.6 0-1.2.2-1.7.6-.5.3-.9.8-1 1.3V37h-3.3v-9.8h-1.8v-2.7zm16.9-5H50v11.6c0 1.2.2 2.1.5 2.6s.8.8 1.5.8c.5 0 1 0 1.3-.2l1-.4 1.2 2.2a15.3 15.3 0 01-1.8 1 6.1 6.1 0 01-2.3.3c-1.5 0-2.7-.4-3.5-1.3-.8-.8-1.1-1.9-1.1-3.4V22.3h-2.1v-2.7zm12.8 5h4.3L62 26h.1c.9-1.2 2.3-1.9 4.2-1.9a6 6 0 012.1.4c.7.3 1.2.6 1.7 1.1.4.6.8 1.2 1 2 .3.8.4 1.7.4 2.8 0 1-.1 2-.4 3-.3.8-.7 1.5-1.2 2.1-.6.6-1.2 1-2 1.4-.7.3-1.6.5-2.6.5-.5 0-1 0-1.5-.2-.5 0-1-.2-1.3-.3V42h-3.2V27.2h-1.9v-2.7zm8 2.4c-.7 0-1.3.2-1.8.5s-.9.8-1 1.4V34c.2.2.5.3 1 .4l1.3.2c.4 0 .9 0 1.3-.2s.7-.4 1-.8c.3-.4.6-.8.7-1.3.2-.6.3-1.2.3-2 0-1-.3-1.9-.8-2.5-.6-.6-1.2-.9-2-.9z"/>
</g>
<path fill="#3AC" d="M205 112.5L180 125v-25z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M180 112.5h-23.1"/>
<path fill="#3AC" d="M1000 112.5L975 125v-25z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M975 112.5h-23.1"/>
<path fill="#EAC1CC" stroke="#F03969" stroke-linejoin="round" stroke-width="3.8" d="M230 75h135l23.5 43.4L365 160H230l23.5-41.5z"/>
<path fill="#F2D7B2" stroke="#F0A439" stroke-linejoin="round" stroke-width="3.8" d="M395 75h135l23.5 43.4L530 160H395l23.5-41.5z"/>
<path fill="#F2E7A6" stroke="#CDB217" stroke-linejoin="round" stroke-width="3.8" d="M515 75h135l23.5 43.4L650 160H515l23.5-41.5z"/>
<path fill="#D7E99A" stroke="#B2D73A" stroke-linejoin="round" stroke-width="3.8" d="M640 75h135l23.5 43.4L775 160H640l23.5-41.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" stroke-linejoin="round" stroke-width="3.8" d="M765 75h135l23.5 43.4L900 160H765l23.5-41.5z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M265.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm10.9-13.2c1 0 1.8.1 2.6.4a5.6 5.6 0 013.3 3.4c.3.8.4 1.8.4 2.8 0 1-.1 1.9-.4 2.7a5.5 5.5 0 01-3.3 3.4 7 7 0 01-2.6.5 7 7 0 01-2.6-.5 5.6 5.6 0 01-3.3-3.4 7.8 7.8 0 010-5.5c.3-.8.7-1.5 1.3-2 .5-.6 1.2-1 2-1.4a7 7 0 012.6-.4zm0 10.8c1 0 1.9-.3 2.4-1 .5-.8.7-1.8.7-3.2 0-1.4-.2-2.4-.7-3.2-.5-.7-1.3-1-2.4-1-1 0-1.9.3-2.4 1-.5.8-.8 1.8-.8 3.2 0 1.4.3 2.4.8 3.1.5.8 1.3 1.1 2.4 1.1zm11.9-16.4v10.7h.5l.5-.1.4-.3 3.2-4 .4-.4.7-.1h2.8l-4 4.7-.4.5-.5.4.4.4.4.6 4.3 6.2h-2.8l-.6-.1c-.2-.1-.3-.2-.4-.5l-3.3-4.8a1 1 0 00-.4-.4h-1.2v5.8h-3.1v-18.6h3zm16 5.6c.7 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1.1 3 .6.6 1.4.9 2.4.9.6 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.7-.2-2.5-.5s-1.4-.7-2-1.3c-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-1 0-1.6.2-2.1.8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9c.4 0 .6.2.8.5l.2 1a7 7 0 011.7-1.2 4.6 4.6 0 012.2-.5c.7 0 1.4 0 1.9.3l1.4 1 .8 1.6c.2.6.3 1.2.3 2v8.1h-3.1v-8.2c0-.7-.2-1.4-.6-1.8-.3-.4-.9-.6-1.6-.6l-1.5.3c-.5.3-1 .6-1.3 1v9.3h-3.1zm17.5-12.8V125H327v-12.8h3zm.4-3.8l-.1.8a2 2 0 01-1 1 2 2 0 01-2.2-.4 2 2 0 01-.4-.6l-.2-.8a2 2 0 01.6-1.4 2 2 0 011.3-.5l.8.1a2 2 0 011 1l.3.8zm12.3 5v.7l-.3.5-6.2 8h6.4v2.4h-10v-1.3l.2-.5c0-.2.1-.4.3-.5l6.1-8.2h-6.2v-2.3h9.8v1.3zm7.8-1.4c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.8 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.3 2.3-.2.3-.3.1h-.6l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M440.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm15.5-.2H455l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-1.9 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.8-.1l-1.2-.7a3 3 0 01-.7-1c-.2-.5-.3-1-.3-1.6 0-.5.1-1 .4-1.4.2-.5.6-.9 1.2-1.3s1.4-.7 2.4-1c1-.2 2.2-.3 3.7-.3v-.8c0-.9-.2-1.5-.6-2-.3-.3-.9-.5-1.6-.5a3.8 3.8 0 00-2 .5l-.8.4c-.2.2-.4.2-.6.2-.3 0-.4 0-.6-.2l-.3-.3-.6-1c1.5-1.4 3.2-2 5.3-2 .8 0 1.4 0 2 .3a4.3 4.3 0 012.5 2.6c.2.6.3 1.3.3 2v8.1zm-6-2h.9a3.3 3.3 0 001.4-.7l.7-.6v-2.2c-1 0-1.7.1-2.3.3a6 6 0 00-1.5.4l-.7.6c-.2.2-.3.5-.3.8 0 .5.2.9.5 1.1.3.3.8.4 1.3.4zm13.5-11l1.5.1 1.3.5h3.7v1.2l-.1.4-.6.2-1.1.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.5 3c-.4.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3.2.5.4.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.2 1-.4 1.6-.3.5-.7 1-1.2 1.4-.6.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.9-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.1-1 .5-1.5.4-.4.9-.7 1.5-1-.3-.1-.5-.4-.7-.7a2 2 0 01-.3-1.1v-.6l.4-.6.5-.6.8-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3l1.6-.8c.6-.2 1.3-.3 2-.3zm3.3 13.6c0-.3 0-.5-.2-.6-.1-.2-.3-.3-.6-.4l-1-.2a16.7 16.7 0 00-2.2-.2H462c-.4.1-.6.4-.8.6-.3.3-.4.6-.4 1 0 .2 0 .4.2.6l.5.5 1 .3 1.4.1 1.5-.1c.4-.1.8-.2 1-.4l.7-.5.1-.7zm-3.3-7.3c.3 0 .7 0 1-.2l.7-.4.4-.7.1-.8a2 2 0 00-.5-1.5c-.4-.4-1-.6-1.8-.6-.7 0-1.3.2-1.7.6a2 2 0 00-.5 1.5l.1.8a1.8 1.8 0 001.2 1.1l1 .2zm12.9-6.3l1.5.1 1.4.5h3.7v1.2l-.2.4-.5.2-1.2.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.4 3c-.5.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3 0 .5.3.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.1 1-.4 1.6-.3.5-.7 1-1.2 1.4-.5.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.8-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.2-1 .5-1.5.4-.4 1-.7 1.6-1-.3-.1-.6-.4-.8-.7a2 2 0 01-.3-1.1l.1-.6.3-.6.6-.6.7-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3c.5-.3 1-.6 1.7-.8.6-.2 1.3-.3 2-.3zm3.4 13.6c0-.3-.1-.5-.3-.6-.1-.2-.3-.3-.6-.4l-.9-.2a16.7 16.7 0 00-2.3-.2H475l-.8.6c-.2.3-.3.6-.3 1l.1.6.6.5 1 .3 1.4.1 1.5-.1 1-.4c.3-.1.5-.3.6-.5l.2-.7zm-3.4-7.3c.4 0 .7 0 1-.2.3 0 .5-.2.7-.4l.4-.7.2-.8a2 2 0 00-.6-1.5c-.4-.4-1-.6-1.7-.6-.8 0-1.3.2-1.7.6a2 2 0 00-.6 1.5c0 .3 0 .6.2.8a1.8 1.8 0 001 1.1l1 .2zm13.8-6.3c.8 0 1.5.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3l-.1.3-.2.2h-8.3c0 1.4.4 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.2.3 1 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.8 1c-.4.2-1 .2-1.4.2-.9 0-1.7-.2-2.4-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1l-.5-1-.8-.6-1.3-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.2.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.2 1.5v8h-3z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M556.6 129.2v-17h2l.4.1c.2.1.3.2.3.4l.3 1.2c.5-.6 1-1 1.8-1.4a4.8 4.8 0 014.2-.1c.6.3 1.1.7 1.5 1.2a6 6 0 011 2 10.3 10.3 0 010 5.6c-.3.8-.7 1.5-1.1 2a5.1 5.1 0 01-6 1.7l-1.3-1v5.3h-3zm6-14.8c-.6 0-1.1.1-1.6.4-.4.3-.9.6-1.3 1.1v5.8a3 3 0 002.5 1.1c.5 0 .9 0 1.3-.2l1-.8c.2-.4.4-.8.5-1.4a8.6 8.6 0 000-3.8c0-.5-.2-1-.4-1.3a2 2 0 00-.9-.7c-.3-.2-.6-.2-1-.2zm18.2 10.6h-1.3l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-2 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.7-.1c-.5-.2-.9-.4-1.2-.7a3 3 0 01-.8-1c-.2-.5-.3-1-.3-1.6 0-.5.2-1 .4-1.4.3-.5.7-.9 1.3-1.3.6-.4 1.4-.7 2.4-1 1-.2 2.2-.3 3.6-.3v-.8c0-.9-.2-1.5-.5-2-.4-.3-1-.5-1.6-.5a3.8 3.8 0 00-2.1.5l-.7.4c-.2.2-.4.2-.7.2-.2 0-.4 0-.5-.2-.2 0-.3-.2-.4-.3l-.5-1c1.4-1.4 3.2-2 5.3-2a4.3 4.3 0 014.4 3c.2.5.3 1.2.3 1.9v8.1zm-6-2h1a3.3 3.3 0 001.4-.7l.6-.6v-2.2c-.9 0-1.6.1-2.2.3a6 6 0 00-1.5.4l-.8.6-.2.8c0 .5.2.9.5 1.1.3.3.7.4 1.2.4zm9 2v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1zm17.9-10.3l-.3.3h-.8a32.9 32.9 0 00-1.4-.7h-1c-.6 0-1 0-1.4.3-.4.3-.5.6-.5 1 0 .3 0 .5.2.7l.7.5 1 .4a33 33 0 012.3.8c.4.2.8.4 1 .7.4.2.6.5.8 1l.2 1.2c0 .7 0 1.2-.3 1.7-.2.6-.6 1-1 1.4-.4.4-1 .7-1.6.9a7 7 0 01-3.5.2 7.6 7.6 0 01-2.3-.8l-.8-.7.7-1.1c0-.2.2-.3.3-.4h1a12 12 0 001.4.8l1.2.1h1l.6-.4c.1-.2.3-.3.3-.5l.1-.6c0-.3 0-.6-.2-.8l-.7-.5-1-.3a33.5 33.5 0 01-2.4-.9 4 4 0 01-1-.7 3 3 0 01-.7-1 3.7 3.7 0 011-4.2c.4-.3.9-.6 1.5-.8.6-.2 1.3-.3 2-.3 1 0 1.8.1 2.5.4.7.3 1.3.7 1.8 1.2l-.7 1zm8.6-2.7c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M701.6 125v-12.8h2c.3 0 .6.2.7.5l.2 1a7 7 0 011.8-1.2 4.6 4.6 0 012.2-.5c.7 0 1.3 0 1.9.3.5.3 1 .6 1.3 1 .4.5.7 1 .8 1.6.2.6.3 1.2.3 2v8.1h-3v-8.2c0-.7-.2-1.4-.6-1.8-.4-.4-1-.6-1.6-.6-.6 0-1 .1-1.5.3l-1.4 1v9.3h-3zm19.6-13c.8 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.4 3l-.1.3-.3.2H718c.2 1.4.5 2.4 1.1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.5-.1h.4l.2.3.9 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.8 1c-.5.2-1 .2-1.4.2-.9 0-1.7-.2-2.5-.5-.7-.3-1.4-.7-2-1.3-.5-.5-1-1.3-1.3-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.6-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-1 1.2-1 2.1h5.7l-.1-1.1-.5-1-.9-.6-1.2-.2zm8 10.8v-12.8h1.8l.7.1.3.7.1 1.5a6 6 0 011.6-1.9c.7-.4 1.4-.7 2.1-.7.7 0 1.2.2 1.6.5l-.4 2.3c0 .1 0 .2-.2.3l-.3.1h-.5l-.9-.2c-.6 0-1.2.2-1.6.6a4 4 0 00-1.2 1.5v8h-3z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M831 123.3a2 2 0 01.5-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.3 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.2-.1-.3-.3-.4-.6a2 2 0 01-.2-.7zm5.5 0a2 2 0 01.6-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.2-.7zm5.7 0a2 2 0 01.5-1.3 2 2 0 011.4-.6 1.9 1.9 0 011.3.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.1-.7z"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 13 KiB

305
nlp/spacy/tokenization.svg Normal file
View File

@@ -0,0 +1,305 @@
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="598" height="386" viewBox="0 0 598 386">
<defs>
<path id="a" d="M51.3 10.9a4.3 4.3 0 01-.6-2.2c0-.6.2-1.2.5-1.9a6 6 0 011.4-1.6l.6.4.1.2-.1.3a7.5 7.5 0 00-.6 1l-.2.5a2.5 2.5 0 000 1.4l.3.8.1.2c0 .2 0 .3-.3.4l-1.2.5zm3.4 0a4.3 4.3 0 01-.7-2.2c0-.6.2-1.2.5-1.9A6 6 0 0156 5.2l.6.4h.1v.5a7.5 7.5 0 00-.7 1l-.2.5a2.5 2.5 0 000 1.4l.4.8v.2c0 .2 0 .3-.2.4l-1.2.5zm7.4 9.3H69V22h-9V6.2h2.2v14zM75 10.7a5 5 0 011.9.3 4.1 4.1 0 012.4 2.5c.2.7.3 1.4.3 2.2v.6l-.4.1h-7.5c0 .7.1 1.4.3 1.9.2.5.4 1 .7 1.3l1.1.8 1.5.2c.5 0 .9 0 1.2-.2a6 6 0 001.6-.7l.5-.2.3.2.6.7-.9.8-1 .5a6.9 6.9 0 01-4.6 0c-.7-.2-1.2-.6-1.7-1-.5-.6-.8-1.2-1.1-2a7.6 7.6 0 010-4.7 4.7 4.7 0 012.7-3c.6-.2 1.3-.3 2.1-.3zm0 1.4a3 3 0 00-2.2.8c-.5.6-.9 1.3-1 2.3h6l-.1-1.2c-.1-.4-.3-.7-.6-1-.2-.3-.5-.5-.9-.7a3 3 0 00-1.2-.2zm10.5 10c-.9 0-1.6-.2-2-.7-.5-.5-.7-1.2-.7-2v-6.9h-1.4l-.3-.1-.1-.3v-.8l1.8-.2.5-3.5.1-.3h1.3V11H88v1.4h-3.2v6.7c0 .5.1.8.4 1 .2.3.5.4.8.4l.6-.1a2.3 2.3 0 00.6-.4h.2l.3.1.6 1c-.3.3-.8.5-1.2.7l-1.5.3zm6.2-16.7a4.1 4.1 0 01.6 2.1 4 4 0 01-.5 2 6 6 0 01-1.3 1.6l-.6-.4-.2-.1v-.1l.1-.3a5.1 5.1 0 00.7-1l.2-.5a2.5 2.5 0 000-1.4l-.4-.8v-.2c0-.2 0-.3.2-.4l1.2-.5zm9.7 7.3c-.1.2-.3.2-.4.2h-.4a8.6 8.6 0 00-1.2-.6 3.4 3.4 0 00-2 0l-.6.3-.4.5-.2.7c0 .2.1.5.3.7l.6.5 1 .3a49.6 49.6 0 013 1.3l.7.8.2 1.2c0 .5 0 1-.3 1.4-.2.5-.4.8-.8 1.1a4 4 0 01-1.3.8c-.5.2-1.1.3-1.8.3a5.6 5.6 0 01-3.7-1.4l.4-.7.2-.2.4-.1.4.1.5.4.8.3 1 .2c.5 0 .8 0 1-.2.4 0 .6-.2.8-.4l.4-.6.2-.7c0-.3-.1-.5-.3-.7a2 2 0 00-.6-.6l-1-.3a68.4 68.4 0 01-2.1-.8l-1-.5-.6-.9c-.2-.3-.2-.7-.2-1.2a3 3 0 011-2.2c.3-.3.7-.6 1.2-.8a5 5 0 011.7-.2c.8 0 1.4.1 2 .3l1.5 1-.4.7z"/>
<path id="b" d="M183.5 10.7l1.4.1 1.1.5h3v.7c0 .3 0 .4-.4.5l-1.3.2c.3.4.4 1 .4 1.6a3.4 3.4 0 01-1.2 2.6c-.3.3-.8.5-1.3.7a5.6 5.6 0 01-3.2 0l-.5.5-.2.5c0 .3.1.5.4.6.2.2.5.3.8.3l1.2.1a36.1 36.1 0 012.7.3c.5 0 .9.2 1.2.4.4.2.7.4.9.7a3 3 0 010 2.6c-.3.5-.6 1-1 1.3-.5.3-1 .6-1.7.8a8.4 8.4 0 01-4.3 0 5 5 0 01-1.6-.6c-.4-.2-.7-.6-.9-1-.2-.3-.3-.6-.3-1 0-.6.2-1 .5-1.4.4-.4.9-.7 1.5-1a2 2 0 01-.8-.5c-.2-.3-.3-.6-.3-1l.1-.5c0-.2.2-.4.3-.5a2.9 2.9 0 011-1c-.5-.2-1-.6-1.2-1.2-.3-.5-.5-1-.5-1.7a3.3 3.3 0 011.2-2.6 4 4 0 011.3-.8l1.7-.2zm3.5 12c0-.4 0-.6-.2-.8l-.7-.4-.9-.2a13.9 13.9 0 00-2.2-.1l-1.2-.1-1 .7a1.5 1.5 0 00-.2 1.7l.6.6c.3.1.6.3 1 .3l1.4.2c.6 0 1 0 1.4-.2.5 0 .8-.2 1.1-.4.3-.1.5-.3.7-.6l.2-.8zm-3.5-6.1l1-.2c.4-.1.6-.3.8-.5l.5-.7.2-.9c0-.7-.2-1.2-.7-1.6-.4-.4-1-.6-1.8-.6s-1.4.2-1.8.6c-.4.4-.6 1-.6 1.6 0 .3 0 .6.2 1a2 2 0 001.2 1c.3.2.6.3 1 .3zm12-6c.8 0 1.6.2 2.2.5a4.7 4.7 0 012.8 3c.2.7.3 1.5.3 2.3 0 .9 0 1.7-.3 2.4s-.6 1.3-1 1.8c-.6.5-1.1.9-1.8 1.2-.6.2-1.4.4-2.2.4-.8 0-1.5-.2-2.2-.4-.6-.3-1.2-.7-1.7-1.2-.4-.5-.8-1.1-1-1.8a7 7 0 01-.4-2.4c0-.8.1-1.6.4-2.3.2-.8.6-1.4 1-1.9.5-.5 1-.8 1.7-1.1.7-.3 1.4-.4 2.2-.4zm0 10c1.1 0 2-.3 2.5-1 .5-.8.8-1.8.8-3.2 0-1.3-.3-2.3-.8-3-.5-.8-1.4-1.2-2.5-1.2-.5 0-1 .1-1.4.3-.4.2-.8.5-1 .8l-.7 1.4-.2 1.7.2 1.8.6 1.3c.3.4.7.6 1 .8l1.5.3z"/>
<path id="c" d="M250.4 22.2c-.8 0-1.5-.3-2-.8s-.7-1.2-.7-2v-6.9h-1.3l-.3-.1-.2-.3v-.8l1.9-.2.4-3.5c0-.1 0-.2.2-.3h1.3V11h3.2v1.4h-3.2v6.7c0 .5 0 .8.3 1 .2.3.5.4.9.4l.5-.1a2.3 2.3 0 00.7-.4h.2l.3.1.5 1a4.1 4.1 0 01-2.7 1zm9.4-11.5c.8 0 1.5.1 2.2.4a4.7 4.7 0 012.7 3 7.2 7.2 0 010 4.7c-.2.7-.6 1.3-1 1.8-.5.5-1 .9-1.7 1.2-.7.2-1.4.4-2.2.4-.8 0-1.6-.2-2.2-.4-.7-.3-1.2-.7-1.7-1.2s-.8-1.1-1-1.8a7 7 0 01-.4-2.4c0-.8 0-1.6.3-2.3a4.7 4.7 0 012.7-3c.7-.3 1.5-.4 2.3-.4zm0 10c1 0 2-.4 2.4-1.2.6-.7.9-1.7.9-3 0-1.4-.3-2.4-.9-3.2-.5-.7-1.3-1-2.4-1-.6 0-1 0-1.5.2l-1 .8c-.3.4-.5.8-.6 1.4l-.2 1.7c0 .7 0 1.3.2 1.8.1.5.3 1 .6 1.3.3.4.6.6 1 .8.4.2 1 .3 1.5.3z"/>
<path id="d" d="M347.6 6.2l.5.1.3.3 9.1 11.9a7.5 7.5 0 010-1.1V6.2h1.8V22h-1a1 1 0 01-.5 0 1 1 0 01-.3-.4l-9.1-11.9a14.1 14.1 0 010 1V22h-1.9V6.2h1.1zm14.6 14.6a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1c.2 0 .4.2.5.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1zm10-5V22h-2v-6.3l-5.9-9.5h2c.1 0 .3 0 .4.2l.3.3 3.6 6.2a7.6 7.6 0 01.6 1.4 13 13 0 01.6-1.4l3.6-6.2.3-.3.4-.2h2l-5.9 9.5zm5.2 5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1c.2 0 .3.2.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.4-14.6v6.3a27.8 27.8 0 01-.2 4h-1.4a66.4 66.4 0 01-.2-4V6.2h1.8zm-2.3 14.6a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1c.2 0 .3.2.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.1-15.4a4.1 4.1 0 01.6 2.1 4 4 0 01-.5 2 6 6 0 01-1.3 1.6l-.6-.4-.2-.1v-.1l.1-.3a5.1 5.1 0 00.7-1l.2-.5a2.5 2.5 0 000-1.4l-.4-.8v-.2c0-.2 0-.3.2-.4l1.2-.5zm3.4 0a4.1 4.1 0 01.6 2.1 4 4 0 01-.5 2 6 6 0 01-1.4 1.6l-.6-.4-.1-.1v-.1-.3a5.1 5.1 0 00.7-1l.2-.5a2.5 2.5 0 000-1.4l-.4-.8v-.2c0-.2 0-.3.3-.4l1.2-.5z"/>
<path id="e" d="M13.5 77.8c-.5-.8-.8-1.6-.8-2.5 0-.7.2-1.4.6-2 .3-.7.9-1.4 1.6-2l.8.6.2.2V72.4l-.1.2a5.7 5.7 0 00-.6.8l-.2.6-.1.7.1.8c0 .3.2.5.4.9l.1.3c0 .2-.1.4-.4.5l-1.6.6zm3.6 0c-.5-.8-.7-1.6-.7-2.5 0-.7.2-1.4.5-2 .4-.7 1-1.4 1.6-2l.9.6.1.2v.5a5.7 5.7 0 00-.7.8l-.2.6v1.5l.5.9v.3c0 .2 0 .4-.3.5l-1.7.6z"/>
<path id="f" d="M80.8 86.8h6.8v1.7h-9V72.8h2.2v14zm12.9-9.6a5 5 0 011.8.4A4.1 4.1 0 0198 80c.2.6.3 1.3.3 2.1l-.1.6-.4.2h-7.4c0 .7.1 1.3.3 1.8.2.5.4 1 .7 1.3.3.4.7.6 1.1.8l1.5.3 1.2-.2a6 6 0 001.6-.7l.4-.2c.2 0 .3 0 .4.2l.6.7-1 .7c-.2.3-.6.4-1 .6a6.9 6.9 0 01-4.5 0c-.7-.3-1.2-.6-1.7-1.1-.5-.5-.9-1.2-1.1-1.9a7.6 7.6 0 010-4.7c.2-.7.5-1.3 1-1.8.4-.5 1-.9 1.6-1.2.6-.2 1.4-.4 2.2-.4zm0 1.5a3 3 0 00-2.2.8c-.5.5-.9 1.3-1 2.3h6l-.1-1.3-.6-1c-.2-.2-.5-.5-.9-.6a3 3 0 00-1.2-.2zm10.5 10c-.9 0-1.6-.2-2-.7-.5-.5-.8-1.2-.8-2.1V79h-1.6l-.1-.4v-.8l1.8-.2.5-3.4.1-.3.3-.1h1v3.8h3.2V79h-3.2v6.7c0 .5.1.8.3 1 .3.3.6.4 1 .4h.4a2.3 2.3 0 00.7-.4l.2-.1c.1 0 .2 0 .3.2l.6 1-1.3.7-1.4.2zm6.2-16.7a4.1 4.1 0 01.6 2 4 4 0 01-.5 2 6 6 0 01-1.4 1.6l-.6-.3V77h-.1l.1-.3a5.1 5.1 0 00.6-1l.2-.6a2.5 2.5 0 000-1.3c0-.3-.2-.6-.3-.8l-.1-.3c0-.1 0-.3.3-.4l1.2-.4zm9.6 7.2c0 .2-.2.3-.4.3l-.3-.1a8.6 8.6 0 00-1.3-.6 3.4 3.4 0 00-1.8 0l-.7.4-.5.5-.1.6c0 .3 0 .5.2.7l.7.5 1 .4a49.6 49.6 0 013 1.3c.2.2.5.5.6.8.2.3.3.7.3 1.1l-.3 1.5c-.2.4-.5.8-.8 1a4 4 0 01-1.3.8l-1.8.3a5.6 5.6 0 01-3.8-1.3l.5-.8.2-.2h.8l.5.4.7.4 1.2.1 1-.1.7-.4.4-.6.1-.7c0-.3 0-.6-.2-.8a2 2 0 00-.7-.5l-.9-.4a68.4 68.4 0 01-2.1-.7l-1-.6-.6-.8-.3-1.2a3 3 0 011-2.3l1.3-.7a5 5 0 011.7-.3c.7 0 1.4.1 2 .4.6.2 1 .5 1.5 1l-.5.6z"/>
<path id="g" d="M182.2 77.2c.4 0 .9 0 1.3.2.4 0 .8.2 1.2.4h3v.8c0 .2-.2.4-.5.4l-1.2.2c.2.5.3 1 .3 1.6a3.4 3.4 0 01-1.1 2.6c-.4.3-.8.6-1.4.7-.5.2-1 .3-1.6.3-.6 0-1 0-1.5-.2l-.5.5c-.2.2-.2.3-.2.5s0 .4.3.6l.8.3h1.2a36.1 36.1 0 012.8.3l1.2.4.8.8c.2.3.3.7.3 1.2s0 1-.3 1.4c-.3.5-.6.9-1 1.2a7 7 0 01-3.8 1.1c-.9 0-1.6 0-2.2-.2a5 5 0 01-1.5-.6l-1-1-.2-1c0-.6.1-1.1.5-1.5.3-.4.8-.7 1.4-1a2 2 0 01-.7-.5c-.2-.2-.3-.6-.3-1v-.5l.3-.5a2.9 2.9 0 011.1-.9c-.5-.3-1-.7-1.3-1.2-.3-.5-.5-1.1-.5-1.8 0-.5.1-1 .4-1.5l.8-1.1a4 4 0 011.4-.7c.5-.2 1-.3 1.7-.3zm3.4 12c0-.3 0-.6-.2-.7-.1-.2-.4-.3-.6-.4l-1-.2a13.9 13.9 0 00-2.2-.2h-1.1c-.4.1-.8.4-1 .6a1.5 1.5 0 00-.2 1.8c.1.2.3.4.6.5.2.2.6.3 1 .4l1.4.1 1.4-.1c.4-.1.8-.2 1-.4l.7-.6c.2-.3.2-.6.2-.8zm-3.4-6.1c.4 0 .7 0 1-.2.3 0 .6-.2.8-.4l.4-.7.2-1c0-.6-.2-1.2-.6-1.6-.4-.4-1-.6-1.8-.6s-1.4.2-1.8.6a2.5 2.5 0 00-.5 2.5 2 2 0 001.2 1.2l1 .2zm12-5.9c.8 0 1.5.2 2.2.4a4.7 4.7 0 012.7 3c.2.7.4 1.5.4 2.4 0 .8-.2 1.6-.4 2.3-.2.7-.6 1.3-1 1.8-.5.5-1 1-1.7 1.2-.7.3-1.4.4-2.2.4-.8 0-1.6-.1-2.2-.4-.7-.3-1.3-.7-1.7-1.2-.5-.5-.8-1-1-1.8a7 7 0 01-.5-2.3c0-.9.2-1.7.4-2.4.3-.7.6-1.3 1-1.8.5-.5 1.1-.9 1.8-1.2.6-.2 1.4-.4 2.2-.4zm0 10c1 0 1.9-.4 2.4-1.1.6-.8.8-1.8.8-3.1s-.2-2.4-.8-3.1c-.5-.8-1.3-1.1-2.4-1.1-.6 0-1 0-1.5.3-.4.1-.7.4-1 .8-.3.3-.5.8-.6 1.3-.2.5-.2 1.1-.2 1.8 0 .6 0 1.2.2 1.8.1.5.3 1 .6 1.3l1 .8c.4.2 1 .3 1.5.3z"/>
<path id="h" d="M249 88.7c-1 0-1.6-.2-2-.7-.6-.5-.8-1.2-.8-2.1V79h-1.6l-.2-.4v-.8l1.9-.2.4-3.4c0-.2 0-.2.2-.3l.3-.1h1v3.8h3.2V79h-3.2v6.7c0 .5 0 .8.3 1 .2.3.5.4.9.4h.5a2.3 2.3 0 00.7-.4l.2-.1c.1 0 .2 0 .3.2l.5 1-1.2.7-1.5.2zm9.3-11.5c.8 0 1.5.2 2.2.4a4.7 4.7 0 012.7 3c.3.7.4 1.5.4 2.4 0 .8-.1 1.6-.4 2.3-.2.7-.6 1.3-1 1.8-.5.5-1 1-1.7 1.2-.7.3-1.4.4-2.2.4-.8 0-1.6-.1-2.2-.4-.7-.3-1.2-.7-1.7-1.2s-.8-1-1-1.8a7 7 0 01-.4-2.3c0-.9 0-1.7.3-2.4s.6-1.3 1.1-1.8c.5-.5 1-.9 1.7-1.2.6-.2 1.4-.4 2.2-.4zm0 10c1 0 2-.4 2.4-1.1.6-.8.9-1.8.9-3.1s-.3-2.4-.9-3.1c-.5-.8-1.3-1.1-2.4-1.1-.6 0-1 0-1.5.3-.4.1-.7.4-1 .8-.3.3-.5.8-.6 1.3L255 83c0 .6 0 1.2.2 1.8.1.5.3 1 .6 1.3l1 .8c.4.2 1 .3 1.5.3z"/>
<path id="i" d="M347.2 72.8h.5l.3.3 9.1 12a7.5 7.5 0 010-1.2V72.8h1.8v15.7h-1a1 1 0 01-.5 0 1 1 0 01-.3-.3l-9.1-12a14.1 14.1 0 010 1.1v11.2h-1.9V72.8h1.1zm14.6 14.5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4c.2 0 .4 0 .5.2.2 0 .3.1.5.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1zm10-5v6.2h-2v-6.2l-5.9-9.5h2l.4.1.2.4 3.7 6.1a7.6 7.6 0 01.6 1.4 13 13 0 01.6-1.4l3.6-6.1.3-.4.4-.1h2l-5.9 9.5zm5.2 5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4c.1 0 .3 0 .5.2.2 0 .3.1.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.4-14.5V79a27.8 27.8 0 01-.3 4h-1.3a66.4 66.4 0 01-.3-4v-6.3h2zm-2.3 14.5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4c.1 0 .3 0 .5.2l.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.1-15.3a4.1 4.1 0 01.6 2 4 4 0 01-.5 2 6 6 0 01-1.3 1.6l-.6-.3-.2-.2.1-.3a5.1 5.1 0 00.7-1l.2-.6a2.5 2.5 0 000-1.3l-.4-.8v-.3c0-.1 0-.3.2-.4l1.2-.4zm3.3 0a4.1 4.1 0 01.7 2 4 4 0 01-.5 2 6 6 0 01-1.4 1.6l-.6-.3-.1-.2v-.3a5.1 5.1 0 00.7-1l.2-.6a2.5 2.5 0 000-1.3c0-.3-.2-.6-.4-.8v-.3c0-.1 0-.3.2-.4l1.2-.4z"/>
<path id="j" d="M13.5 141c-.5-.7-.8-1.6-.8-2.4 0-.7.2-1.4.6-2.1.3-.7.9-1.3 1.6-1.8l.8.5.2.1v.4l-.1.2a5.7 5.7 0 00-.6.8l-.2.6-.1.6.1.8c0 .3.2.6.4 1l.1.2c0 .3-.1.4-.4.5l-1.6.7zm3.6 0c-.5-.7-.7-1.6-.7-2.4 0-.7.2-1.4.5-2.1.4-.7 1-1.3 1.6-1.8l.9.5.1.1V136a5.7 5.7 0 00-.7.8l-.2.6v1.4l.5 1v.2c0 .3 0 .4-.3.5l-1.7.7z"/>
<path id="k" d="M60 149.4h6.3v2.4h-9.4V136h3v13.5zm12.4-9c.7 0 1.4 0 2 .3a4.3 4.3 0 012.5 2.6 6 6 0 01.4 2.7l-.1.3-.2.2h-7.3c0 1.2.4 2 1 2.6a3 3 0 002 .8c.5 0 1 0 1.2-.2l.9-.3.6-.4.5-.1h.3l.2.2.8 1-1 1a5.7 5.7 0 01-2.4.8h-1.3a6 6 0 01-2.1-.3c-.7-.3-1.3-.7-1.8-1.2s-.9-1.1-1.2-1.9a7.3 7.3 0 010-4.7c.2-.7.6-1.3 1-1.8a5 5 0 011.7-1.2c.7-.3 1.5-.4 2.3-.4zm0 1.9c-.7 0-1.3.2-1.8.7-.4.4-.7 1-.8 1.9h5v-1l-.5-.8a2 2 0 00-.8-.6l-1-.2zm10.8 9.7c-1 0-1.7-.3-2.2-.8-.6-.6-.8-1.4-.8-2.3v-6.3H79c-.1 0-.3 0-.4-.2l-.1-.4v-1l1.8-.4.6-3c0-.2 0-.3.2-.4l.4-.1h1.4v3.5h3v2h-3v6c0 .4 0 .7.2 1l.8.2h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.3-.9.6-1.4.7a5 5 0 01-1.6.3z"/>
<path id="l" d="M183 140.3c.4 0 .9 0 1.3.2.4 0 .8.2 1.2.4h3.2v1l-.1.4c0 .1-.2.2-.5.2l-1 .2a3.5 3.5 0 01.3 1.3 3.4 3.4 0 01-1.3 2.6c-.4.4-.9.6-1.4.8a5.7 5.7 0 01-3 .1c-.3.2-.5.5-.5.7 0 .3 0 .4.3.5l.8.3h1.2a24.2 24.2 0 012.6.3l1.2.4.8.8c.2.4.3.8.3 1.4 0 .5 0 1-.3 1.4l-1 1.3-1.8.9a8.9 8.9 0 01-4.5 0c-.7-.1-1.2-.3-1.6-.6l-1-1-.2-1c0-.6.1-1 .4-1.4.4-.4.8-.7 1.4-.9-.3-.1-.5-.3-.7-.6-.2-.3-.2-.6-.2-1v-.5l.3-.6.5-.5.6-.4a3.3 3.3 0 01-1.8-3 3.4 3.4 0 011.2-2.7c.4-.3 1-.5 1.5-.7a6 6 0 011.8-.3zm3 12c0-.2-.1-.4-.3-.5 0-.2-.3-.3-.5-.3a14.7 14.7 0 00-2.8-.3l-1-.1c-.4.1-.6.3-.8.6-.2.2-.3.5-.3.8 0 .2 0 .3.2.5 0 .2.2.3.4.5l.9.2 1.2.2c.5 0 1 0 1.3-.2.4 0 .7-.1 1-.3l.5-.5.1-.6zm-3-6.4l.8-.1.7-.4.3-.6.2-.7c0-.6-.2-1-.5-1.4-.4-.3-.9-.5-1.5-.5-.7 0-1.2.2-1.5.5-.4.4-.5.8-.5 1.4v.7a1.6 1.6 0 001 1l1 .1zm12.3-5.5c.8 0 1.6 0 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3 6.2 6.2 0 01-4.6 0 5 5 0 01-2.8-3c-.3-.7-.4-1.6-.4-2.4 0-1 0-1.7.4-2.5.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.1c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.2-.2-2.2-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.6-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.7.5.7 1.2 1 2.2 1z"/>
<path id="m" d="M251.3 152c-1 0-1.7-.3-2.2-.8-.6-.6-.8-1.4-.8-2.3v-6.3H247c-.1 0-.2 0-.3-.2l-.2-.4v-1l1.8-.4.6-3c0-.2 0-.3.2-.4l.4-.1h1.4v3.5h3v2h-3v6c0 .4 0 .7.3 1l.7.2h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.3-.9.6-1.4.7a5 5 0 01-1.6.3zm9.7-11.6c.8 0 1.6 0 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3 6.2 6.2 0 01-4.6 0 5 5 0 01-2.8-3c-.3-.7-.4-1.6-.4-2.4 0-1 0-1.7.4-2.5.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.1c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.2-.2-2.2-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.6-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.7.5.7 1.2 1 2.2 1z"/>
<path id="n" d="M347.7 136l.5.1.3.3 9.1 11.9a7.5 7.5 0 010-1V136h1.8v15.7h-1a1 1 0 01-.5 0 1 1 0 01-.3-.4l-9.1-11.8a14.1 14.1 0 010 1v11.2h-1.9v-15.7h1.1zm14.6 14.6a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1.5.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1zm10-5v6.2h-2v-6.3l-5.9-9.4h2l.4.1.2.4 3.7 6a7.6 7.6 0 01.6 1.5 13 13 0 01.6-1.4l3.6-6.1c0-.2.2-.3.3-.4l.4-.1h2l-5.9 9.4zm5.2 5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.4-14.5v6.2a27.8 27.8 0 01-.3 4h-1.3a66.4 66.4 0 01-.3-4v-6.2h2zm-2.3 14.5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm8.1-15.4a4.1 4.1 0 01.6 2.1 4 4 0 01-.5 2 6 6 0 01-1.3 1.6l-.6-.4h-.2v-.2l.1-.3a5.1 5.1 0 00.7-1l.2-.5a2.5 2.5 0 000-1.4l-.4-.8v-.2c0-.2 0-.3.2-.4l1.2-.5zm3.3 0a4.1 4.1 0 01.7 2.1 4 4 0 01-.5 2 6 6 0 01-1.4 1.6l-.6-.4h-.1v-.2-.3a5.1 5.1 0 00.7-1l.2-.5a2.5 2.5 0 000-1.4l-.4-.8v-.2c0-.2 0-.3.2-.4l1.2-.5z"/>
<path id="o" d="M127 135l.6 1.2a4.3 4.3 0 01-.4 3.3c-.4.7-.9 1.3-1.6 1.9l-.8-.5-.2-.2v-.2l.1-.3a6.5 6.5 0 00.6-.9 2.9 2.9 0 00.3-1.2l-.1-.8c0-.3-.2-.6-.4-.9l-.1-.3c0-.2.1-.4.4-.5l1.6-.6zm9.7 7.7l-.2.3h-.3a29 29 0 00-1.6-.6h-1a2 2 0 00-1.2.3 1 1 0 00-.5.9l.3.6.6.4.9.3a29 29 0 012 .8l.9.5.6.9c.2.3.3.7.3 1.1 0 .6-.1 1-.3 1.5-.2.5-.5.9-.9 1.2a4 4 0 01-1.4.8 6.1 6.1 0 01-3 .2 6.7 6.7 0 01-2.1-.7l-.8-.6.7-1c0-.2.1-.2.3-.3l.4-.1.4.1a10.6 10.6 0 001.3.6l1 .2.8-.1.6-.3.3-.5.1-.5c0-.2 0-.5-.2-.6a2 2 0 00-.6-.5 29.4 29.4 0 01-3-1l-.9-.6-.6-1c-.2-.3-.2-.7-.2-1.2a3.3 3.3 0 011-2.4 4 4 0 011.4-.8c.5-.2 1.1-.2 1.8-.2a4.8 4.8 0 013.7 1.4l-.6 1z"/>
<path id="p" d="M13.5 208.7c-.5-.8-.8-1.6-.8-2.5 0-.7.2-1.4.6-2 .3-.7.9-1.3 1.6-2l.8.6.2.2V203.3l-.1.2a5.7 5.7 0 00-.6.9l-.2.5-.1.7.1.8c0 .3.2.6.4.9l.1.3c0 .2-.1.4-.4.5l-1.6.6zm3.6 0c-.5-.8-.7-1.6-.7-2.5 0-.7.2-1.4.5-2 .4-.7 1-1.3 1.6-2l.9.6.1.2v.5a5.7 5.7 0 00-.7.9l-.2.5v1.5l.5.9v.3c0 .2 0 .4-.3.5l-1.7.6z"/>
<path id="q" d="M60 217h6.3v2.5h-9.4v-16h3V217zm12.4-9c.7 0 1.4.1 2 .3a4.3 4.3 0 012.5 2.6 6 6 0 01.4 2.7l-.1.3-.2.2h-7.3c0 1.2.4 2 1 2.6a3 3 0 002 .8l1.2-.1.9-.4.6-.3.5-.2h.3l.2.3.8 1-1 .9a5.7 5.7 0 01-2.4.8l-1.3.1a6 6 0 01-2.1-.4c-.7-.2-1.3-.6-1.8-1.1-.5-.5-.9-1.2-1.2-2a7.3 7.3 0 010-4.7c.2-.7.6-1.3 1-1.8a5 5 0 011.7-1.2c.7-.3 1.5-.4 2.3-.4zm0 2c-.7 0-1.3.2-1.8.6-.4.5-.7 1-.8 2h5v-1l-.5-.9a2 2 0 00-.8-.6l-1-.2zm10.8 9.6c-1 0-1.7-.2-2.2-.8-.6-.6-.8-1.3-.8-2.3v-6.3H79l-.4-.1-.1-.5v-1l1.8-.3.6-3.1c0-.2 0-.3.2-.4H82.9v3.5h3v1.9h-3v6.1c0 .4 0 .6.2.8.2.2.5.3.8.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.4-.9.6-1.4.8a5 5 0 01-1.6.2z"/>
<path id="r" d="M185.5 208l1.3.1 1.2.5h3.2v1l-.1.4-.5.2-1 .1a3.5 3.5 0 01.3 1.3 3.4 3.4 0 01-1.3 2.7l-1.4.7a5.7 5.7 0 01-3 .2c-.3.2-.5.4-.5.7 0 .2 0 .4.3.5l.8.2h1.2a24.2 24.2 0 012.6.3l1.2.5c.3.2.6.4.8.8.2.3.3.8.3 1.3s0 1-.3 1.5l-1 1.2c-.6.4-1.1.7-1.8.9a8.9 8.9 0 01-4.5 0c-.7 0-1.2-.3-1.6-.6l-1-1-.2-1c0-.6.1-1 .4-1.4.4-.3.8-.6 1.4-.8l-.7-.7c-.2-.2-.2-.6-.2-1v-.5l.3-.5.5-.5.6-.4c-.6-.3-1-.8-1.3-1.3-.4-.5-.5-1-.5-1.8a3.4 3.4 0 011.2-2.6c.4-.4 1-.6 1.5-.8a6 6 0 011.8-.2zm3 12c0-.3-.1-.4-.3-.6l-.5-.3a14.7 14.7 0 00-2.8-.3l-1-.1-.8.6c-.2.2-.3.5-.3.8 0 .2 0 .4.2.5 0 .2.2.4.4.5l.9.3h2.5l1-.3.5-.5.1-.6zm-3-6.5l.8-.1c.3 0 .5-.2.7-.4l.3-.6.2-.7c0-.6-.2-1-.5-1.3-.4-.4-.9-.5-1.5-.5-.7 0-1.2.1-1.5.5-.4.3-.5.7-.5 1.3v.7a1.6 1.6 0 001 1l1 .1zm12.3-5.5c.8 0 1.6.1 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3c-.6.3-1.4.4-2.2.4-.9 0-1.7-.1-2.3-.4a5 5 0 01-3-3c-.2-.7-.3-1.5-.3-2.4 0-.9 0-1.7.4-2.4.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.2c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.5.7-1.5.7-2.7 0-1.2-.2-2.1-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.7-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.8.5.6 1.2 1 2.2 1z"/>
<path id="s" d="M252.8 219.6c-1 0-1.7-.2-2.2-.8-.6-.6-.8-1.3-.8-2.3v-6.3h-1.2l-.3-.1-.2-.5v-1l1.8-.3.6-3.1c0-.2 0-.3.2-.4H252.5v3.5h3v1.9h-3v6.1c0 .4 0 .6.3.8.1.2.4.3.7.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.4-.9.6-1.4.8a5 5 0 01-1.6.2zm9.7-11.6c.8 0 1.6.1 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3c-.6.3-1.4.4-2.2.4-.9 0-1.7-.1-2.3-.4a5 5 0 01-3-3c-.2-.7-.3-1.5-.3-2.4 0-.9 0-1.7.4-2.4.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.2c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.5.7-1.5.7-2.7 0-1.2-.2-2.1-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.7-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.8.5.6 1.2 1 2.2 1z"/>
<path id="t" d="M331.8 203.7h.4l.3.4 9.2 11.8a7.5 7.5 0 01-.1-1v-11.2h1.9v15.8h-1.1a1 1 0 01-.4-.1 1 1 0 01-.4-.3l-9-11.9a14.1 14.1 0 010 1v11.3h-2v-15.8h1.2zm14.6 14.5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.3h.5l.4.4a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1zm10-5v6.3h-2.1v-6.3l-5.8-9.5h1.9l.4.1.3.4 3.6 6.1a7.6 7.6 0 01.6 1.4 13 13 0 01.7-1.4l3.6-6.1.2-.3c.1-.2.3-.2.5-.2h1.9l-5.8 9.5zm5.1 5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.3h.5l.5.4a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1zm8.5-14.5v6.3a27.8 27.8 0 01-.3 4h-1.3a66.4 66.4 0 01-.3-4v-6.3h1.9zm-2.4 14.5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.3h.5l.5.4a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .3 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1z"/>
<path id="u" d="M127 202.6l.6 1.2a4.3 4.3 0 01-.4 3.4c-.4.6-.9 1.3-1.6 1.8l-.8-.5-.2-.2v-.1l.1-.4a6.5 6.5 0 00.6-.8 2.9 2.9 0 00.3-1.3l-.1-.8c0-.3-.2-.6-.4-.9l-.1-.3c0-.2.1-.4.4-.5l1.6-.6zm9.7 7.8l-.2.2h-.3a29 29 0 00-1.6-.6h-1a2 2 0 00-1.2.3 1 1 0 00-.5.9c0 .2.1.5.3.6.1.2.3.3.6.4l.9.4a29 29 0 012 .7l.9.6c.3.2.5.5.6.8.2.3.3.7.3 1.2l-.3 1.5-.9 1.2a4 4 0 01-1.4.8 6.1 6.1 0 01-3 .1 6.7 6.7 0 01-2.1-.7l-.8-.6.7-1 .3-.3h.8a10.6 10.6 0 001.3.7l1 .1h.8l.6-.4.3-.4.1-.5c0-.3 0-.5-.2-.7a2 2 0 00-.6-.4 29.4 29.4 0 01-3-1l-.9-.7-.6-.9c-.2-.3-.2-.8-.2-1.3a3.3 3.3 0 011-2.4 4 4 0 011.4-.7 5.6 5.6 0 014 .1c.6.2 1.1.6 1.5 1l-.6 1z"/>
<path id="v" d="M429.6 202.6l.5 1.2a4.3 4.3 0 01-.3 3.4c-.4.6-1 1.3-1.6 1.8l-.9-.5-.1-.2v-.1-.4a7.8 7.8 0 00.7-.8l.2-.6a2.5 2.5 0 000-1.5l-.5-.9v-.3c0-.2 0-.4.3-.5l1.7-.6zm3.6 0l.6 1.2a4.3 4.3 0 01-.4 3.4c-.3.6-.9 1.3-1.6 1.8l-.8-.5-.2-.2v-.1l.1-.4a7.8 7.8 0 00.6-.8l.2-.6a2.5 2.5 0 000-1.5c0-.3-.2-.6-.4-.9l-.1-.3c0-.2.1-.4.4-.5l1.6-.6z"/>
<path id="w" d="M13.5 275.3c-.5-.9-.8-1.7-.8-2.5 0-.7.2-1.4.6-2.1.3-.7.9-1.3 1.6-1.9l.8.6.2.1v.4l-.1.1a5.7 5.7 0 00-.6.9l-.2.6-.1.6.1.8c0 .3.2.6.4 1l.1.2c0 .3-.1.4-.4.5l-1.6.7zm3.6 0c-.5-.9-.7-1.7-.7-2.5 0-.7.2-1.4.5-2.1.4-.7 1-1.3 1.6-1.9l.9.6.1.1v.5a5.7 5.7 0 00-.7.9l-.2.6v1.4l.5 1v.2c0 .3 0 .4-.3.5l-1.7.7z"/>
<path id="x" d="M58.4 283.6h6.4v2.4h-9.4v-16h3v13.6zm12.5-9c.7 0 1.4 0 2 .3a4.3 4.3 0 012.5 2.6 6 6 0 01.4 2.7l-.1.3-.2.1-.3.1h-7c0 1.2.4 2 1 2.6a3 3 0 002 .8c.5 0 1 0 1.2-.2l.9-.3.6-.4.5-.1h.3l.2.2.8 1c-.3.4-.6.7-1 .9a5.7 5.7 0 01-2.4.9H71a6 6 0 01-2.1-.3c-.7-.3-1.3-.7-1.8-1.2s-.9-1.1-1.2-1.9a7.3 7.3 0 010-4.8c.2-.6.6-1.2 1-1.7a5 5 0 011.7-1.2c.7-.3 1.5-.5 2.3-.5zm0 1.9c-.7 0-1.3.2-1.8.7-.4.4-.7 1-.8 1.9h5v-1l-.5-.9a2 2 0 00-.8-.5l-1-.2zm10.8 9.7c-1 0-1.7-.3-2.2-.9-.6-.5-.8-1.3-.8-2.2v-6.4H77l-.1-.5V275l1.8-.3.6-3c0-.2 0-.3.2-.4l.4-.1h1.4v3.5h3v2h-3v6c0 .4 0 .7.2.9.2.2.5.3.8.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.3-.9.6-1.4.7a5 5 0 01-1.6.3z"/>
<path id="y" d="M183 274.5c.4 0 .9 0 1.3.2.4 0 .8.2 1.2.4h3.2v1l-.1.4c0 .1-.2.2-.5.2l-1 .2a3.5 3.5 0 01.3 1.3 3.4 3.4 0 01-1.3 2.6l-1.4.8a5.7 5.7 0 01-3 .1c-.3.2-.5.4-.5.7 0 .2 0 .4.3.5l.8.2 1.2.1a24.2 24.2 0 012.6.3c.5 0 .9.2 1.2.4l.8.8c.2.4.3.8.3 1.3s0 1-.3 1.5l-1 1.3c-.6.3-1.1.6-1.8.8-.7.3-1.4.4-2.3.4-.9 0-1.6-.1-2.2-.3-.7-.1-1.2-.4-1.6-.6l-1-1-.2-1.1c0-.5.1-1 .4-1.3.4-.4.8-.7 1.4-.9l-.7-.6c-.2-.3-.2-.6-.2-1v-.5l.3-.6.5-.5.6-.4a3.3 3.3 0 01-1.8-3 3.4 3.4 0 011.2-2.7 6 6 0 013.2-1zm3 12c0-.2-.1-.4-.3-.5 0-.2-.3-.3-.5-.4a14.7 14.7 0 00-2.8-.3h-1c-.4.1-.6.3-.8.5-.2.3-.3.5-.3.8 0 .2 0 .4.2.6 0 .2.2.3.4.4.2.2.5.3.9.3l1.2.1h1.3l1-.4.5-.5.1-.6zm-3-6.4c.3 0 .6 0 .8-.2.3 0 .5-.2.7-.3l.3-.6.2-.8c0-.5-.2-1-.5-1.3-.4-.3-.9-.5-1.5-.5-.7 0-1.2.2-1.5.5-.4.3-.5.8-.5 1.3v.8a1.6 1.6 0 001 1h1zm12.3-5.6c.8 0 1.6.2 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.5 4.9 4.9 0 01-2.9 3 6.2 6.2 0 01-4.6 0 5 5 0 01-2.8-3c-.3-.8-.4-1.6-.4-2.5 0-.9 0-1.7.4-2.4.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.2c.6-.2 1.4-.4 2.3-.4zm0 9.6c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.3-.2-2.2-.7-2.8-.4-.7-1.1-1-2-1-1 0-1.7.3-2.2 1-.4.6-.6 1.5-.6 2.8 0 1.2.2 2 .6 2.7.5.7 1.2 1 2.2 1z"/>
<path id="z" d="M251.3 286.2c-1 0-1.7-.3-2.2-.9-.6-.5-.8-1.3-.8-2.2v-6.4h-1.5l-.2-.5V275l1.8-.3.6-3c0-.2 0-.3.2-.4l.4-.1h1.4v3.5h3v2h-3v6c0 .4 0 .7.3.9.1.2.4.3.7.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.3-.9.6-1.4.7a5 5 0 01-1.6.3zm9.7-11.7c.8 0 1.6.2 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.5 4.9 4.9 0 01-2.9 3 6.2 6.2 0 01-4.6 0 5 5 0 01-2.8-3c-.3-.8-.4-1.6-.4-2.5 0-.9 0-1.7.4-2.4.2-.7.6-1.3 1-1.8a5 5 0 011.9-1.2c.6-.2 1.4-.4 2.3-.4zm0 9.6c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.3-.2-2.2-.7-2.8-.4-.7-1.1-1-2-1-1 0-1.7.3-2.2 1-.4.6-.6 1.5-.6 2.8 0 1.2.2 2 .6 2.7.5.7 1.2 1 2.2 1z"/>
<path id="A" d="M310.4 270.2l.4.1.4.3 9 11.9a7.5 7.5 0 010-1.1v-11.2h2V286H321a1 1 0 01-.4 0 1 1 0 01-.3-.4l-9.2-11.9a14.1 14.1 0 010 1V286h-1.8v-15.8h1.1zm14.6 14.6a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1c.2 0 .3.2.5.3a1.3 1.3 0 01.3 1 1.4 1.4 0 01-.3 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.4-1zm10-5v6.2H333v-6.3l-5.8-9.5h1.9c.2 0 .3 0 .4.2.2 0 .2.2.3.3l3.6 6.2a7.6 7.6 0 01.7 1.4 13 13 0 01.6-1.4l3.6-6.2.3-.3.4-.2h2l-5.9 9.5zm5.2 5a1.4 1.4 0 01.4-1 1.3 1.3 0 011-.4l.5.1.4.3a1.3 1.3 0 01.4 1 1.4 1.4 0 01-.4 1 1.3 1.3 0 01-1 .4 1.4 1.4 0 01-1-.4 1.4 1.4 0 01-.3-1z"/>
<path id="B" d="M126.5 269.1l.6 1.3a4.3 4.3 0 01-.4 3.3c-.4.7-.9 1.3-1.6 1.9l-.8-.6-.2-.1v-.2l.1-.3a6.5 6.5 0 00.6-.9 2.9 2.9 0 00.3-1.2l-.1-.8c0-.3-.2-.6-.4-1l-.1-.2c0-.3.1-.4.4-.5l1.6-.7zm9.7 7.8l-.2.3h-.3a29 29 0 00-1.6-.6h-1a2 2 0 00-1.2.3 1 1 0 00-.5.9l.3.6.6.4.9.3a29 29 0 012 .8l.9.5c.3.3.5.5.6.9.2.3.3.7.3 1.1 0 .6-.1 1-.3 1.5-.2.5-.5.9-.9 1.2a4 4 0 01-1.4.8 6.1 6.1 0 01-3 .2 6.7 6.7 0 01-2.1-.8l-.8-.5.7-1c0-.2.1-.3.3-.3l.4-.1.4.1a10.6 10.6 0 001.3.6l1 .2.8-.1.6-.3.3-.5.1-.5c0-.3 0-.5-.2-.6a2 2 0 00-.6-.5 29.4 29.4 0 01-3-1l-.9-.7-.6-.8c-.2-.4-.2-.8-.2-1.3a3.3 3.3 0 011-2.4 4 4 0 011.4-.8 5.6 5.6 0 014 .1c.6.3 1.1.6 1.5 1l-.6 1z"/>
<path id="C" d="M429.6 269.1l.5 1.3a4.3 4.3 0 01-.3 3.3c-.4.7-1 1.3-1.6 1.9l-.9-.6-.1-.1v-.2-.3a7.8 7.8 0 00.7-.9l.2-.6a2.5 2.5 0 000-1.4l-.5-1v-.2c0-.3 0-.4.3-.5l1.7-.7zm3.6 0l.6 1.3a4.3 4.3 0 01-.4 3.3c-.3.7-.9 1.3-1.6 1.9l-.8-.6-.2-.1v-.2l.1-.3a7.8 7.8 0 00.6-.9l.2-.6a2.5 2.5 0 000-1.4c0-.3-.2-.6-.4-1l-.1-.2c0-.3.1-.4.4-.5l1.6-.7z"/>
<path id="D" d="M387.8 270v6.4a19.2 19.2 0 01-.3 4h-1.9a41.8 41.8 0 01-.3-4V270h2.5zm-3 14.5a1.7 1.7 0 01.5-1.2 1.7 1.7 0 011.2-.5 1.6 1.6 0 011.2.5 1.7 1.7 0 01.3 1.9c0 .2-.2.3-.3.5l-.5.3-.7.2a1.7 1.7 0 01-1.2-.5l-.3-.5-.2-.7z"/>
<path id="E" d="M13.5 341.8c-.5-.8-.8-1.6-.8-2.5 0-.7.2-1.4.6-2 .3-.7.9-1.4 1.6-2l.8.6.2.2V336.4l-.1.2a5.7 5.7 0 00-.6.8l-.2.6-.1.7.1.8c0 .3.2.5.4.9l.1.3c0 .2-.1.4-.4.5l-1.6.6zm3.6 0c-.5-.8-.7-1.6-.7-2.5 0-.7.2-1.4.5-2 .4-.7 1-1.4 1.6-2l.9.6.1.2v.5a5.7 5.7 0 00-.7.8l-.2.6v1.5l.5.9v.3c0 .2 0 .4-.3.5l-1.7.6z"/>
<path id="F" d="M60 350.1h6.3v2.4h-9.4v-15.9h3v13.5zm12.4-9c.7 0 1.4.1 2 .3a4.3 4.3 0 012.5 2.6 6 6 0 01.4 2.7l-.1.3-.2.2h-7.3c0 1.2.4 2 1 2.6a3 3 0 002 .8l1.2-.1.9-.4.6-.3.5-.2h.3l.2.3.8 1-1 .8a5.7 5.7 0 01-2.4 1h-1.3a6 6 0 01-2.1-.4c-.7-.2-1.3-.6-1.8-1.1-.5-.5-.9-1.2-1.2-2a7.3 7.3 0 010-4.7c.2-.7.6-1.3 1-1.8a5 5 0 011.7-1.2c.7-.3 1.5-.4 2.3-.4zm0 2c-.7 0-1.3.2-1.8.6-.4.4-.7 1-.8 1.9h5v-1l-.5-.8a2 2 0 00-.8-.6l-1-.2zm10.8 9.6c-1 0-1.7-.3-2.2-.8-.6-.6-.8-1.3-.8-2.3v-6.3H79l-.4-.1-.1-.5v-1l1.8-.4.6-3c0-.2 0-.3.2-.4H82.9v3.5h3v1.9h-3v6.1c0 .4 0 .6.2.8.2.2.5.3.8.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.4-.9.6-1.4.8a5 5 0 01-1.6.2z"/>
<path id="G" d="M184 341l1.3.2 1.2.4h3.2v1l-.1.5-.5.2-1 .1a3.5 3.5 0 01.3 1.3 3.4 3.4 0 01-1.3 2.7l-1.4.7a5.7 5.7 0 01-3 .1c-.3.3-.5.5-.5.8 0 .2 0 .4.3.5l.8.2h1.2a24.2 24.2 0 012.6.3l1.2.5c.3.2.6.4.8.8.2.3.3.8.3 1.3s0 1-.3 1.4c-.3.5-.6 1-1 1.3-.6.4-1.1.7-1.8.9a8.9 8.9 0 01-4.5 0c-.7-.1-1.2-.3-1.6-.6l-1-1-.2-1c0-.6.1-1 .4-1.4.4-.4.8-.6 1.4-.9-.3-.1-.5-.3-.7-.6-.2-.2-.2-.6-.2-1v-.5l.3-.5.5-.5.6-.5a3.3 3.3 0 01-1.8-3 3.4 3.4 0 011.2-2.7c.4-.3 1-.5 1.5-.7a6 6 0 011.8-.2zm3 12c0-.2-.1-.3-.3-.5l-.5-.3a14.7 14.7 0 00-2.8-.3l-1-.1-.8.6c-.2.2-.3.5-.3.8 0 .2 0 .4.2.5 0 .2.2.4.4.5l.9.3h2.5l1-.4c.2 0 .4-.3.5-.4l.1-.6zm-3-6.4l.8-.1.7-.4.3-.6.2-.7c0-.6-.2-1-.5-1.3-.4-.4-.9-.5-1.5-.5-.7 0-1.2.1-1.5.5-.4.3-.5.7-.5 1.3v.7a1.6 1.6 0 001 1l1 .1zm12.3-5.5c.8 0 1.6.1 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3c-.6.3-1.4.4-2.2.4-.9 0-1.7-.1-2.3-.4a5 5 0 01-3-3c-.2-.7-.3-1.5-.3-2.4 0-1 0-1.7.4-2.4.2-.7.6-1.4 1-1.9a5 5 0 011.9-1.1c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.2-.2-2.1-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.7-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.8.5.6 1.2 1 2.2 1z"/>
<path id="H" d="M249.8 352.7c-1 0-1.7-.3-2.2-.8-.6-.6-.8-1.3-.8-2.3v-6.3h-1.2l-.3-.1-.2-.5v-1l1.8-.4.6-3c0-.2 0-.3.2-.4H249.5v3.5h3v1.9h-3v6.1c0 .4 0 .6.3.8.1.2.4.3.7.3h.4a2.3 2.3 0 00.5-.3h.4l.2.2.8 1.3c-.4.4-.9.6-1.4.8a5 5 0 01-1.6.2zm9.7-11.6c.8 0 1.6.1 2.2.4a5 5 0 013 3c.2.7.3 1.5.3 2.4a7 7 0 01-.4 2.4 4.9 4.9 0 01-2.9 3c-.6.3-1.4.4-2.2.4-.9 0-1.7-.1-2.3-.4a5 5 0 01-3-3c-.2-.7-.3-1.5-.3-2.4 0-1 0-1.7.4-2.4.2-.7.6-1.4 1-1.9a5 5 0 011.9-1.1c.6-.3 1.4-.4 2.3-.4zm0 9.5c.9 0 1.6-.3 2-1 .5-.6.7-1.5.7-2.7 0-1.2-.2-2.1-.7-2.8-.4-.6-1.1-1-2-1-1 0-1.7.4-2.2 1-.4.7-.6 1.6-.6 2.8 0 1.2.2 2.1.6 2.8.5.6 1.2 1 2.2 1z"/>
<path id="I" d="M312.3 336.6h.4l.2.1.2.2.2.2 8.4 10.6a11 11 0 010-1.4v-9.7h2.5v16h-1.5c-.2 0-.4 0-.6-.2-.2 0-.3-.2-.4-.4l-8.4-10.6a15.3 15.3 0 01.1 1.4v9.7h-2.6v-15.9h1.5zm14.3 14.4a1.7 1.7 0 01.5-1.1 1.7 1.7 0 011.2-.5 1.6 1.6 0 011.2.5 1.7 1.7 0 01.3 1.8c0 .2-.2.4-.3.5l-.6.4-.6.1a1.7 1.7 0 01-1.2-.5c-.1-.1-.3-.3-.3-.5l-.2-.7zm11-4.6v6.1h-3v-6.1l-5.7-9.8h2.6l.6.2.4.5 2.9 5.3a13.3 13.3 0 01.8 1.7 12 12 0 01.7-1.7l2.9-5.3c0-.2.2-.3.4-.5l.6-.2h2.6l-5.8 9.8zm4.7 4.6a1.7 1.7 0 01.5-1.1 1.7 1.7 0 011.2-.5 1.6 1.6 0 011.1.5 1.7 1.7 0 01.4 1.8c0 .2-.2.4-.4.5l-.5.4-.6.1a1.7 1.7 0 01-1.2-.5l-.4-.5-.1-.7z"/>
<path id="J" d="M128 335.7l.6 1.2a4.3 4.3 0 01-.4 3.4c-.4.6-.9 1.2-1.6 1.8l-.8-.5-.2-.2v-.2l.1-.3a6.5 6.5 0 00.6-.9 2.9 2.9 0 00.3-1.2l-.1-.8c0-.3-.2-.6-.4-.9l-.1-.3c0-.2.1-.4.4-.5l1.6-.6zm9.7 7.8l-.2.2h-.3a29 29 0 00-1.6-.6h-1a2 2 0 00-1.2.3 1 1 0 00-.5.9l.3.6c.1.2.3.3.6.4l.9.4a29 29 0 012 .7l.9.6c.3.2.5.5.6.8.2.3.3.7.3 1.2l-.3 1.5-.9 1.2a4 4 0 01-1.4.7 6.1 6.1 0 01-3 .2 6.7 6.7 0 01-2.1-.7l-.8-.6.7-1 .3-.3h.8a10.6 10.6 0 001.3.7l1 .1.8-.1.6-.3.3-.4.1-.5c0-.3 0-.5-.2-.7a2 2 0 00-.6-.4 29.4 29.4 0 01-3-1l-.9-.7-.6-.9c-.2-.3-.2-.8-.2-1.3a3.3 3.3 0 011-2.4 4 4 0 011.4-.7 5.6 5.6 0 014 .1c.6.2 1.1.6 1.5 1l-.6 1z"/>
<path id="K" d="M429.6 335.7l.5 1.2a4.3 4.3 0 01-.3 3.4l-1.6 1.8-.9-.5-.1-.2v-.2-.3a7.8 7.8 0 00.7-.9l.2-.5a2.5 2.5 0 000-1.5l-.5-.9v-.3c0-.2 0-.4.3-.5l1.7-.6zm3.6 0l.6 1.2a4.3 4.3 0 01-.4 3.4c-.3.6-.9 1.2-1.6 1.8l-.8-.5-.2-.2v-.2l.1-.3a7.8 7.8 0 00.6-.9l.2-.5a2.5 2.5 0 000-1.5c0-.3-.2-.6-.4-.9l-.1-.3c0-.2.1-.4.4-.5l1.6-.6z"/>
<path id="L" d="M387.8 336.6v6.3a19.2 19.2 0 01-.3 4h-1.9a41.8 41.8 0 01-.3-4v-6.3h2.5zm-3 14.4a1.7 1.7 0 01.5-1.1 1.7 1.7 0 011.2-.5 1.6 1.6 0 011.2.5 1.7 1.7 0 01.3 1.8c0 .2-.2.4-.3.5l-.5.4-.7.1a1.7 1.7 0 01-1.2-.5l-.3-.5-.2-.7z"/>
<path id="M" d="M16.4 11.3V15H14V4h3.9c.7 0 1.4.2 2 .3l1.3.8c.4.3.6.7.8 1.1l.3 1.4c0 .6-.1 1-.3 1.5a3 3 0 01-.8 1.2c-.4.3-.8.6-1.4.8-.5.2-1.2.2-2 .2h-1.3zm0-1.9h1.4c.6 0 1-.1 1.4-.4.3-.4.4-.8.4-1.4l-.1-.6a1.4 1.4 0 00-1-1H16.5v3.4zM26 11v4h-2.5V4H27c.8 0 1.5.2 2 .3l1.4.7c.4.3.6.6.8 1l.2 1.3-.1 1a3 3 0 01-1.1 1.6l-1 .5.5.3.4.5 2.3 3.8h-2.3c-.4 0-.7-.2-.9-.5l-1.8-3.2-.3-.3a1 1 0 00-.5 0H26zm0-1.8h1l1-.1.5-.4c.2-.1.3-.3.3-.5l.1-.7c0-.5-.1-.9-.4-1.1-.3-.3-.8-.4-1.5-.4h-1v3.2zm14.5-5.1v2H36v2.5h3.4v1.8H36v2.7h4.5V15h-7V4h7zM49 4v2h-4.5v2.7h3.7v2h-3.7V15h-2.6V4h7zM53 15h-2.5V4H53v11zm4.8-5.6L54.4 4h2.9l.2.3L59.7 8v-.2l.2-.1 1.9-3.3c0-.2.3-.3.5-.3h2.4l-3.4 5.2 3.5 5.7h-2.6l-.4-.1a1 1 0 01-.2-.3l-2.2-3.8-.1.3-2 3.5-.3.3-.4.1h-2.4l3.6-5.6z"/>
<path id="N" d="M20 136.3l-.2.3h-.4-.3a67.9 67.9 0 00-1-.5l-.8-.2c-.5 0-.8.1-1 .4a1 1 0 00-.4.8c0 .2 0 .4.2.5.1.2.3.3.6.4l.7.3a19.6 19.6 0 011.8.7l.8.5a2.6 2.6 0 01.8 2c0 .5-.1 1-.3 1.4a3.3 3.3 0 01-2 2l-1.6.2a5.3 5.3 0 01-2.1-.4 6 6 0 01-1-.4 4 4 0 01-.7-.6l.8-1.2s0-.2.2-.2l.3-.1.5.1a32.8 32.8 0 001.1.7l1 .1c.4 0 .7 0 1-.3.3-.2.4-.5.4-1 0-.2 0-.4-.2-.6l-.6-.4a23.2 23.2 0 01-2.6-.9l-.7-.5-.6-1a3.5 3.5 0 010-2.4l.8-1c.3-.3.7-.6 1.2-.8.4-.2 1-.2 1.6-.2a6 6 0 011.9.3 5 5 0 011.4.8l-.6 1.2zm6.6 6.7c.3 0 .6 0 .9-.2.3 0 .5-.2.7-.5l.4-.7.1-1V134h2.6v6.4a5 5 0 01-.4 1.9 4.1 4.1 0 01-2.4 2.4c-.5.2-1.2.3-2 .3-.7 0-1.3 0-1.9-.3-.6-.2-1-.6-1.5-1-.4-.4-.7-.8-.9-1.4-.2-.6-.3-1.2-.3-1.9v-6.4h2.5v6.4c0 .4 0 .8.2 1 0 .4.2.6.4.8.2.3.4.4.7.5l.9.2zm13.4-9v2h-4.5v2.8h3.7v2h-3.7v4.2h-2.6v-11h7zm8.3 0v2h-4.5v2.8h3.8v2h-3.8v4.2h-2.5v-11h7zm4.1 11H50v-11h2.5v11zm4.7-5.6l-3.4-5.3h2.9l.2.3L59 138l.1-.2.1-.1 2-3.3c0-.2.2-.3.4-.3h2.5l-3.5 5.2 3.5 5.7h-2.5l-.4-.1a1 1 0 01-.2-.3l-2.2-3.8-.2.3-2 3.5-.3.3-.3.1h-2.4l3.5-5.6z"/>
<path id="O" d="M20 201.3l-.2.3h-.4-.3a67.9 67.9 0 00-1-.5l-.8-.2c-.5 0-.8.1-1 .4a1 1 0 00-.4.8c0 .2 0 .4.2.5.1.2.3.3.6.4l.7.3a19.6 19.6 0 011.8.7l.8.5a2.6 2.6 0 01.8 2c0 .5-.1 1-.3 1.4a3.3 3.3 0 01-2 2l-1.6.2a5.3 5.3 0 01-2.1-.4 6 6 0 01-1-.4 4 4 0 01-.7-.6l.8-1.2s0-.2.2-.2l.3-.1.5.1a32.8 32.8 0 001.1.7l1 .1c.4 0 .7 0 1-.3.3-.2.4-.5.4-1 0-.2 0-.4-.2-.6l-.6-.4a23.2 23.2 0 01-2.6-.9l-.7-.5-.6-1a3.5 3.5 0 010-2.4l.8-1c.3-.3.7-.6 1.2-.8.4-.2 1-.2 1.6-.2a6 6 0 011.9.3 5 5 0 011.4.8l-.6 1.2zm6.6 6.7c.3 0 .6 0 .9-.2.3 0 .5-.2.7-.5l.4-.7.1-1V199h2.6v6.4a5 5 0 01-.4 1.9 4.1 4.1 0 01-2.4 2.4c-.5.2-1.2.3-2 .3-.7 0-1.3 0-1.9-.3-.6-.2-1-.6-1.5-1-.4-.4-.7-.8-.9-1.4-.2-.6-.3-1.2-.3-1.9v-6.4h2.5v6.4c0 .4 0 .8.2 1 0 .4.2.6.4.8.2.3.4.4.7.5l.9.2zm13.4-9v2h-4.5v2.8h3.7v2h-3.7v4.2h-2.6v-11h7zm8.3 0v2h-4.5v2.8h3.8v2h-3.8v4.2h-2.5v-11h7zm4.1 11H50v-11h2.5v11zm4.7-5.6l-3.4-5.3h2.9l.2.3L59 203l.1-.2.1-.1 2-3.3c0-.2.2-.3.4-.3h2.5l-3.5 5.2 3.5 5.7h-2.5l-.4-.1a1 1 0 01-.2-.3l-2.2-3.8-.2.3-2 3.5-.3.3-.3.1h-2.4l3.5-5.6z"/>
<path id="P" d="M8 264v2H3.4v2.6h3.4v1.8H3.4v2.7H8v1.9H1v-11h7zm4 5.4L8.8 264h2.9l.2.3L14 268v-.2l.2-.1 1.9-3.3c0-.2.3-.3.5-.3H19l-3.4 5.2L19 275h-2.6l-.4-.1a1 1 0 01-.2-.3l-2.2-3.8-.1.3-2 3.5-.3.3-.4.1H8.6l3.5-5.6zm15.2 2.8h.2l.1.1 1 1c-.4.7-1 1-1.6 1.4-.7.3-1.5.4-2.4.4-.8 0-1.5-.1-2.2-.4a4.8 4.8 0 01-2.7-3 6.5 6.5 0 010-4.4 5.2 5.2 0 013-3 6.3 6.3 0 014.5 0 4.8 4.8 0 011.6 1.1l-.9 1.2-.2.1-.3.1H27l-.2-.2a45 45 0 00-1.2-.5 3.5 3.5 0 00-2 .2l-1 .7-.6 1-.2 1.5c0 .6 0 1 .2 1.5l.6 1.1a2.6 2.6 0 002 1l.7-.1a2.6 2.6 0 001.5-.7h.2l.2-.1zm9.5-8.1v2h-4.5v2.5h3.5v1.8h-3.5v2.7h4.5v1.9h-7v-11h7zm4 7.2v3.7h-2.5v-11H42c.8 0 1.4.2 2 .3.6.2 1 .5 1.4.8l.8 1.1.2 1.4c0 .6 0 1-.3 1.5a3 3 0 01-.8 1.2l-1.3.8c-.6.2-1.2.2-2 .2h-1.3zm0-1.9H42c.7 0 1.1-.1 1.4-.4.3-.4.5-.8.5-1.4l-.1-.6a1.4 1.4 0 00-1-1h-2.1v3.4zm15-5.3v2h-3.1v8.9H50v-9H47v-2h8.7zm3.8 10.9h-2.6v-11h2.6v11zm12.8-5.5c0 .8-.1 1.6-.4 2.2a5.3 5.3 0 01-5.3 3.4c-.8 0-1.6-.1-2.3-.4a5.3 5.3 0 01-3.4-5.2c0-.8.2-1.5.5-2.2a5.2 5.2 0 013-3c.6-.2 1.4-.3 2.2-.3a6 6 0 012.4.4 5.4 5.4 0 013.3 5.1zm-2.6 0c0-.5 0-1-.2-1.4a3 3 0 00-.6-1.1c-.3-.3-.6-.6-1-.7a3.4 3.4 0 00-2.6 0c-.4.1-.7.4-1 .7a3 3 0 00-.5 1l-.3 1.5c0 .6.1 1 .3 1.5 0 .4.3.8.6 1.1.2.3.5.5 1 .7l1.2.2c.5 0 1 0 1.3-.2l1-.7c.3-.3.5-.7.6-1.1l.2-1.5zm5.1-5.4h.5l.2.2.2.2 5.2 6.5a13.8 13.8 0 010-1.1V264H83V275h-1.9a1 1 0 01-.4-.4l-5.1-6.5a23.3 23.3 0 010 1v5.9h-2.2v-11h1.3z"/>
<path id="Q" d="M31.8 334.5c0 .8-.1 1.6-.4 2.2a5.1 5.1 0 01-3 2.9c-.6.3-1.4.4-2.3.4H22v-11h4.2c.9 0 1.7.2 2.4.5s1.3.6 1.8 1.1c.5.5.8 1 1.1 1.8.3.6.4 1.3.4 2.1zm-2.6 0c0-.5 0-1-.2-1.4-.1-.5-.3-.8-.6-1.1-.3-.3-.6-.6-1-.7-.3-.2-.8-.3-1.3-.3h-1.7v7h1.7c.5 0 1 0 1.3-.2l1-.7c.3-.3.5-.7.6-1.1.2-.4.2-1 .2-1.5zm14.6 0c0 .8-.1 1.6-.4 2.2a5.3 5.3 0 01-5.3 3.4c-.8 0-1.6-.1-2.3-.4a5.3 5.3 0 01-3.3-5.2c0-.8.1-1.5.4-2.2a5.2 5.2 0 013-3c.6-.2 1.4-.3 2.2-.3a6 6 0 012.4.4 5.4 5.4 0 013.3 5.1zm-2.6 0c0-.5 0-1-.2-1.4a3 3 0 00-.6-1.1c-.3-.3-.6-.6-1-.7a3.4 3.4 0 00-2.6 0l-1 .7a3 3 0 00-.5 1c-.2.5-.2 1-.2 1.5 0 .6 0 1 .2 1.5.1.4.3.8.6 1.1.2.3.6.5 1 .7l1.2.2c.5 0 1 0 1.3-.2l1-.7c.3-.3.5-.7.6-1.1.2-.4.2-1 .2-1.5zm5.2-5.4h.4l.2.2.2.2 5.2 6.5a13.8 13.8 0 010-1.1V329h2.2V340H52.8a1 1 0 01-.4-.4l-5.2-6.5a23.3 23.3 0 010 1v5.9H45v-11h1.4zm17 0v2H59v2.5h3.5v1.8h-3.5v2.7h4.5v1.9h-7v-11h7z"/>
<path id="R" d="M8 69v2H3.4v2.6h3.4v1.8H3.4V78H8v2H1V69h7zm4 5.4L8.8 69h2.9l.2.3L14 73v-.2l.2-.1 1.9-3.3c0-.2.3-.3.5-.3H19l-3.4 5.2L19 80h-2.6l-.4-.1a1 1 0 01-.2-.3l-2.2-3.8-.1.3-2 3.5-.3.3-.4.1H8.6l3.5-5.6zm15.2 2.8h.2l.1.1 1 1c-.4.7-1 1-1.6 1.4-.7.3-1.5.4-2.4.4-.8 0-1.5-.1-2.2-.4a4.8 4.8 0 01-2.7-3 6.5 6.5 0 010-4.4 5.2 5.2 0 013-3 6.3 6.3 0 014.5 0 4.8 4.8 0 011.6 1.1l-.9 1.2-.2.1-.3.1H27l-.2-.2a45 45 0 00-1.2-.5 3.5 3.5 0 00-2 .2l-1 .7-.6 1-.2 1.5c0 .6 0 1 .2 1.5l.6 1.1a2.6 2.6 0 002 1l.7-.1a2.6 2.6 0 001.5-.7h.2l.2-.1zm9.5-8.1v2h-4.5v2.5h3.5v1.8h-3.5V78h4.5v2h-7V69h7zm4 7.2V80h-2.5V69H42c.8 0 1.4.2 2 .3.6.2 1 .5 1.4.8l.8 1.1.2 1.4c0 .6 0 1-.3 1.5a3 3 0 01-.8 1.2l-1.3.8c-.6.2-1.2.2-2 .2h-1.3zm0-1.9H42c.7 0 1.1-.1 1.4-.4.3-.4.5-.8.5-1.4l-.1-.6a1.4 1.4 0 00-1-1h-2.1v3.4zm15-5.3v2h-3.1V80H50v-9H47v-2h8.7zM59.5 80h-2.6V69h2.6v11zm12.8-5.5c0 .8-.1 1.6-.4 2.2a5.3 5.3 0 01-5.3 3.4c-.8 0-1.6-.1-2.3-.4a5.3 5.3 0 01-3.4-5.2c0-.8.2-1.5.5-2.2a5.2 5.2 0 013-3c.6-.2 1.4-.3 2.2-.3a6 6 0 012.4.4 5.4 5.4 0 013.3 5.1zm-2.6 0c0-.5 0-1-.2-1.4a3 3 0 00-.6-1.1c-.3-.3-.6-.6-1-.7a3.4 3.4 0 00-2.6 0c-.4.1-.7.4-1 .7a3 3 0 00-.5 1l-.3 1.5c0 .6.1 1 .3 1.5 0 .4.3.8.6 1.1.2.3.5.5 1 .7l1.2.2c.5 0 1 0 1.3-.2l1-.7c.3-.3.5-.7.6-1.1l.2-1.5zm5.1-5.4h.5l.2.2.2.2 5.2 6.5a13.8 13.8 0 010-1.1V69H83V80h-1.9a1 1 0 01-.4-.4L75.8 73a23.3 23.3 0 010 1V80h-2.2V69h1.3z"/>
</defs>
<g fill="none" fill-rule="evenodd">
<g stroke-linejoin="round" stroke-width="3.8">
<path stroke="#3AC" d="M82.4 46.5v13h-60v12m60-25v13h21.8v12"/>
<path fill="#C3E7F1" stroke="#3AC" d="M6 5h152.7v41.7H6z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M195.8 46.5v25"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M168.5 5h54.6v41.7h-54.6z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M261.3 46.5v25"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M234 5h54.5v41.7H234z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M377 46.5v25"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M299.5 5h153.8v41.7H299.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M22.4 113v21.8"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M6 71.5h32.7v41.7H6z"/>
<path stroke="#3AC" d="M104.2 113v12H76.9v9.8m27.3-21.8v12h31.6v9.8"/>
<path fill="#C3E7F1" stroke="#3AC" d="M49.6 71.5h109.1v41.7H49.6z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M195.8 113v21.8"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M168.5 71.5h54.6v41.7h-54.6z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M261.3 113v21.8"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M234 71.5h54.5v41.7H234z"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M377 113v21.8"/>
<path fill="#F5F5F5" stroke="#B7B7B7" d="M299.5 71.5h153.8v41.7H299.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M6 134.8h32.7v41.5H6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M77 176.3v26.2"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M49.6 134.8h54.6v41.5H49.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M195.8 176.3v26.2"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M168.5 134.8h54.6v41.5h-54.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M261.3 176.3v26.2"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M234 134.8h54.5v41.5H234z"/>
<path stroke="#3AC" d="M377 176.3v14.2h-22v12m22-26.2v14.2h60v12"/>
<path fill="#C3E7F1" stroke="#3AC" d="M299.5 134.8h153.8v41.5H299.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M135.8 176.3v26.2"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M114 134.8h43.6v41.5H114z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M22.4 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M6 202.2h32.7v41.7H6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M77 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M49.6 202.2h54.6v41.7H49.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M195.8 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M168.5 202.2h54.6v41.7h-54.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M261.3 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M234 202.2h54.5v41.7H234z"/>
<path stroke="#3AC" d="M355 244v12h-21.7v13m21.8-25v12h37v13"/>
<path fill="#C3E7F1" stroke="#3AC" d="M299.5 202.2h110.1v41.7H299.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M135.8 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M114 202.2h43.6v41.7H114z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M437 244v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M420.5 202.2h32.8v41.7h-32.8z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M22.4 310.5v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M6 268.7h32.7v41.8H6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M77 310.5v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M49.6 268.7h54.6v41.8H49.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M195.8 310.5v21.8-18.6 21.8"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M168.5 268.7h54.6v41.8h-54.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M234 268.7h54.5v41.8H234z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M333.3 310.5v25"/>
<path fill="#C3E7F1" stroke="#3AC" d="M299.5 268.7H366v41.8h-66.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M135.8 310.5v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M114 268.7h43.6v41.8H114z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M437 310.5v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M420.5 268.7h32.8v41.8h-32.8z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M392.2 310.5v25"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M375.8 268.7h32.7v41.8h-32.7z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M8 335.5h28.7a2 2 0 012 2V375a2 2 0 01-2 2H8a2 2 0 01-2-2v-37.5c0-1 .9-2 2-2z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M49.6 335.5h54.6V377H49.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M168.5 335.5h54.6V377h-54.6z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M234 335.5h54.5V377H234z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M299.5 335.5H366V377h-66.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M114 335.5h43.6V377H114z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M420.5 335.5h32.8V377h-32.8z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M375.8 335.5h32.7V377h-32.7z"/>
<path fill="#B5F3D4" stroke="#3AD787" d="M261.3 310.5v25"/>
</g>
<g fill-rule="nonzero">
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#a"/>
<use fill="#1A1E23" xlink:href="#a"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#b"/>
<use fill="#1A1E23" xlink:href="#b"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#c"/>
<use fill="#1A1E23" xlink:href="#c"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#d"/>
<use fill="#1A1E23" xlink:href="#d"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#e"/>
<use fill="#1A1E23" xlink:href="#e"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#f"/>
<use fill="#1A1E23" xlink:href="#f"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#g"/>
<use fill="#1A1E23" xlink:href="#g"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#h"/>
<use fill="#1A1E23" xlink:href="#h"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#i"/>
<use fill="#1A1E23" xlink:href="#i"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#j"/>
<use fill="#1A1E23" xlink:href="#j"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#k"/>
<use fill="#1A1E23" xlink:href="#k"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#l"/>
<use fill="#1A1E23" xlink:href="#l"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#m"/>
<use fill="#1A1E23" xlink:href="#m"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#n"/>
<use fill="#1A1E23" xlink:href="#n"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#o"/>
<use fill="#1A1E23" xlink:href="#o"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#p"/>
<use fill="#1A1E23" xlink:href="#p"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#q"/>
<use fill="#1A1E23" xlink:href="#q"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#r"/>
<use fill="#1A1E23" xlink:href="#r"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#s"/>
<use fill="#1A1E23" xlink:href="#s"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#t"/>
<use fill="#1A1E23" xlink:href="#t"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#u"/>
<use fill="#1A1E23" xlink:href="#u"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#v"/>
<use fill="#1A1E23" xlink:href="#v"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#w"/>
<use fill="#1A1E23" xlink:href="#w"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#x"/>
<use fill="#1A1E23" xlink:href="#x"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#y"/>
<use fill="#1A1E23" xlink:href="#y"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#z"/>
<use fill="#1A1E23" xlink:href="#z"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#A"/>
<use fill="#1A1E23" xlink:href="#A"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#B"/>
<use fill="#1A1E23" xlink:href="#B"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#C"/>
<use fill="#1A1E23" xlink:href="#C"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#D"/>
<use fill="#1A1E23" xlink:href="#D"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#E"/>
<use fill="#1A1E23" xlink:href="#E"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#F"/>
<use fill="#1A1E23" xlink:href="#F"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#G"/>
<use fill="#1A1E23" xlink:href="#G"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#H"/>
<use fill="#1A1E23" xlink:href="#H"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#I"/>
<use fill="#1A1E23" xlink:href="#I"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#J"/>
<use fill="#1A1E23" xlink:href="#J"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#K"/>
<use fill="#1A1E23" xlink:href="#K"/>
</g>
<g transform="translate(6 11)">
<use fill="#3D4251" xlink:href="#L"/>
<use fill="#1A1E23" xlink:href="#L"/>
</g>
</g>
<rect width="101" height="20" x="483" y="16" fill="#3AC" fill-rule="nonzero" stroke="#3AC" stroke-width="2.2" rx="10"/>
<rect width="101" height="20" x="483" y="146" fill="#3AC" fill-rule="nonzero" stroke="#3AC" stroke-width="2.2" rx="10"/>
<rect width="101" height="20" x="483" y="211" fill="#3AC" fill-rule="nonzero" stroke="#3AC" stroke-width="2.2" rx="10"/>
<rect width="101" height="20" x="483" y="276" fill="#3AC" fill-rule="nonzero" stroke="#3AC" stroke-width="2.2" rx="10"/>
<rect width="101" height="20" x="483" y="341" fill="#3AD787" fill-rule="nonzero" stroke="#3AD787" stroke-width="2.2" rx="10"/>
<rect width="101" height="20" x="483" y="81" fill="#3AC" fill-rule="nonzero" stroke="#3AC" stroke-width="2.2" rx="10"/>
<g fill-rule="nonzero">
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#M"/>
<use fill="#FFF" xlink:href="#M"/>
</g>
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#N"/>
<use fill="#FFF" xlink:href="#N"/>
</g>
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#O"/>
<use fill="#FFF" xlink:href="#O"/>
</g>
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#P"/>
<use fill="#FFF" xlink:href="#P"/>
</g>
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#Q"/>
<use fill="#FFF" xlink:href="#Q"/>
</g>
<g transform="translate(493 16)">
<use fill="#000" xlink:href="#R"/>
<use fill="#FFF" xlink:href="#R"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 45 KiB

View File

@@ -74,7 +74,6 @@
"* [The Python tutorial](https://docs.python.org/3/tutorial/)\n",
"* [Object-Oriented Programming in Python](http://python-textbok.readthedocs.org/en/latest/index.html)\n",
"* [Python3 tutorial](http://www.python-course.eu/python3_course.php)\n",
"* [Python for the Busy Java Developer, Deepak Sarda, 2014](http://antrix.net/static/pages/python-for-java/online/)\n",
"* [Style Guide for Python Code (PEP-0008)](https://www.python.org/dev/peps/pep-0008/)\n",
"* [Python Slides](http://tdc-www.harvard.edu/Python.pdf)\n",
"* [Python for Programmers - 1 day course](http://www.ucs.cam.ac.uk/docs/course-notes/unix-courses/archived/archived-python-courses/PythonProgIntro/files/notes.pdf)\n",

View File

@@ -85,7 +85,7 @@
"In Python3, there are the following [numeric types](https://docs.python.org/3/library/stdtypes.html#typesnumeric):\n",
"* integers (int): 1, -1, ...\n",
"* floating point numbers (float): 0.1, 1E2\n",
"* complex numbers (complex): 2 + 3j\n",
"* complex numbers (complex): 2 + 3j\n.",
"Let's play a bit"
]
},

View File

@@ -377,7 +377,7 @@
"\n",
"Tuples are faster than lists. Its main usage is when the collection is constant, or you do not want it can be changed (write protected). \n",
"\n",
"Tuples can be converted into lists and vice-versa, with the methods list() and tuple()."
"Tuples can be converted into lists and vice-versa, with the methods *list()* and *tuple()*."
]
},
{

View File

@@ -37,7 +37,7 @@
"\n",
"A set object is an unordered collection of distinct objects. There are two built-in set types: **set** (mutable) and **frozenset** (inmutable).\n",
"\n",
"A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is only one bultin mapping type: **dictionary**."
"A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is only one builtin mapping type: **dictionary**."
]
},
{

View File

@@ -65,7 +65,7 @@
"Python is a **strongly typed** language and **dynamically typed** language.\n",
"\n",
"This means:\n",
"* ** dynamically typed**: variables do not declare a static type (as in Java int a = 2;). Variables have no type themselves, they are just names that hold a reference to some object. The type of the variable is changed dynamically when you change the type of the assigned data object. \n",
"* **dynamically typed**: variables do not declare a static type (as in Java int a = 2;). Variables have no type themselves, they are just names that hold a reference to some object. The type of the variable is changed dynamically when you change the type of the assigned data object. \n",
"* **strongly typed**: the interpreter tracks variable types. There is no implicit type conversion. This means that all the type variables should be converted manually, preventing from unexpected behaviour. "
]
},

View File

@@ -41,7 +41,7 @@
"The first argument of instance class method is self, that refers to the current instance of the class.\n",
"There is a special method, __init__ that initializes the object. It is like a constructor, but the object is already created when __init__ is called.\n",
"\n",
"Instance attributes are define as self.variables. (self is the same than this in Java)."
"Instance attributes are define as *self.variables*. (self is the same than this in Java)."
]
},
{

View File

@@ -0,0 +1,154 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction to Network Analysis\n",
" \n",
"In this session, we are going to get more insight regarding how to analyze and visualize social networks.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Objectives\n",
"\n",
"The main objectives of this session are:\n",
"* Understanding why networks are important in data science\n",
"* Experimenting with network analysis with networkx."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Table of Contents"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"1. [Home](0_Intro_Network_Analysis.ipynb)\n",
"2. [First Steps](1_First_Steps.ipynb)\n",
"3. [Working_with_Graphs](2_Working_with_Graphs.ipynb)\n",
"4. [Network Analysis](3_Network_Analysis.ipynb)\n",
"5. [Social Networks](4_Social_Networks.ipynb)\n",
"6. [Pandas integration](5_Pandas.ipynb)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

1301
sna/1_First_Steps.ipynb Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,374 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Network Analysis](0_Intro_Network_Analysis.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise: Florentine families"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import networkx as nx\n",
"import warnings\n",
"warnings.simplefilter(action='ignore', category=FutureWarning)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"G_florentine = nx.florentine_families_graph()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Exercise: Star Wars"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import networkx as nx\n",
"\n",
"# Taken from https://gist.github.com/codingthat/be03565bd97e789a3835b50235ad562f\n",
"# The original dataset is from:\n",
"# Gabasova, E. (2016). Star Wars social network. DOI: https://doi.org/10.5281/zenodo.1411479\n",
"# \n",
"# Simplified by Federico Albanese.\n",
"\n",
"characters = [\"R2-D2\",\n",
" \"CHEWBACCA\",\n",
" \"C-3PO\",\n",
" \"LUKE\",\n",
" \"DARTH VADER\",\n",
" \"CAMIE\",\n",
" \"BIGGS\",\n",
" \"LEIA\",\n",
" \"BERU\",\n",
" \"OWEN\",\n",
" \"OBI-WAN\",\n",
" \"MOTTI\",\n",
" \"TARKIN\",\n",
" \"HAN\",\n",
" \"DODONNA\",\n",
" \"GOLD LEADER\",\n",
" \"WEDGE\",\n",
" \"RED LEADER\",\n",
" \"RED TEN\"]\n",
"\n",
"\n",
"edges = [(\"CHEWBACCA\", \"R2-D2\"),\n",
" (\"C-3PO\",\"R2-D2\"),\n",
" (\"BERU\", \"R2-D2\"),\n",
" (\"LUKE\", \"R2-D2\"),\n",
" (\"OWEN\", \"R2-D2\"),\n",
" (\"OBI-WAN\", \"R2-D2\"),\n",
" (\"LEIA\", \"R2-D2\"),\n",
" (\"BIGGS\", \"R2-D2\"),\n",
" (\"HAN\", \"R2-D2\"),\n",
" (\"CHEWBACCA\", \"OBI-WAN\"),\n",
" (\"C-3PO\", \"CHEWBACCA\"),\n",
" (\"CHEWBACCA\", \"LUKE\"),\n",
" (\"CHEWBACCA\", \"HAN\"),\n",
" (\"CHEWBACCA\", \"LEIA\"),\n",
" (\"CAMIE\", \"LUKE\"),\n",
" (\"BIGGS\", \"CAMIE\"),\n",
" (\"BIGGS\", \"LUKE\"),\n",
" (\"DARTH VADER\", \"LEIA\"),\n",
" (\"BERU\", \"LUKE\"),\n",
" (\"BERU\", \"OWEN\"),\n",
" (\"BERU\", \"C-3PO\"),\n",
" (\"LUKE\", \"OWEN\"),\n",
" (\"C-3PO\", \"LUKE\"),\n",
" (\"C-3PO\", \"OWEN\"),\n",
" (\"C-3PO\", \"LEIA\"),\n",
" (\"LEIA\", \"LUKE\"),\n",
" (\"BERU\", \"LEIA\"),\n",
" (\"LUKE\", \"OBI-WAN\"),\n",
" (\"C-3PO\", \"OBI-WAN\"),\n",
" (\"LEIA\", \"OBI-WAN\"),\n",
" (\"MOTTI\", \"TARKIN\"),\n",
" (\"DARTH VADER\", \"MOTTI\"),\n",
" (\"DARTH VADER\", \"TARKIN\"),\n",
" (\"HAN\", \"OBI-WAN\"),\n",
" (\"HAN\", \"LUKE\"),\n",
" (\"C-3PO\", \"HAN\"),\n",
" (\"LEIA\", \"MOTT\"),\n",
" (\"LEIA\", \"TARKIN\"),\n",
" (\"HAN\", \"LEIA\"),\n",
" (\"DARTH VADER\", \"OBI-WAN\"),\n",
" (\"DODONNA\", \"GOLD LEADER\"),\n",
" (\"DODONNA\", \"WEDGE\"),\n",
" (\"DODONNA\", \"LUKE\"),\n",
" (\"GOLD LEADER\", \"WEDGE\"),\n",
" (\"GOLD LEADER\", \"LUKE\"),\n",
" (\"LUKE\", \"WEDGE\"),\n",
" (\"BIGGS\", \"LEIA\"),\n",
" (\"LEIA\", \"RED LEADER\"),\n",
" (\"LUKE\", \"RED LEADER\"),\n",
" (\"BIGGS\", \"RED LEADER\"),\n",
" (\"BIGGS\", \"C-3PO\"),\n",
" (\"C-3PO\", \"RED LEADER\"),\n",
" (\"RED LEADER\", \"WEDGE\"),\n",
" (\"GOLD LEADER\", \"RED LEADER\"),\n",
" (\"BIGGS\", \"WEDGE\"),\n",
" (\"RED LEADER\", \"RED TEN\"),\n",
" (\"BIGGS\", \"GOLD LEADER\"),\n",
" (\"LUKE\", \"RED TEN\")]\n",
"\n",
"G_starWars = nx.Graph()\n",
"\n",
"\n",
"G_starWars.add_nodes_from(characters)\n",
"G_starWars.add_edges_from(edges)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise\n",
"In this exercise we are going to practice some of the concepts of the session.\n",
"\n",
"Answer the following questions using the object *G_starWars* and *G_florentine*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of nodes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of edges"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get the list of nodes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get the list of edges"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Draw the graph\n",
"\n",
"Hint. Use different layouts (circular, spring, ...)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Think of interesting micro, meso and macro metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Analyze ego networks of interesting nodes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Analyze communities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

2230
sna/3_Network_Analysis.ipynb Normal file

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More