mirror of
https://github.com/gsi-upm/sitc
synced 2025-06-12 11:22:20 +00:00
Compare commits
32 Commits
cae7d8cbb2
...
b58370a19a
Author | SHA1 | Date | |
---|---|---|---|
|
b58370a19a | ||
|
5c203b0884 | ||
|
5bf815f60f | ||
|
90a3ff098b | ||
|
945a8a7fb6 | ||
|
6532ef1b27 | ||
|
3a73b2b286 | ||
|
2e4ec3cfdc | ||
|
21e7ae2f57 | ||
|
7b4d16964d | ||
|
c5967746ea | ||
|
ed7f0f3e1c | ||
|
9324516c19 | ||
|
6fc5565ea0 | ||
|
1113485833 | ||
|
0c3f317a85 | ||
|
0b550c837b | ||
|
d7ce6df7fe | ||
|
e2edae6049 | ||
|
4ea0146def | ||
|
e7b2cee795 | ||
|
9e1d0e5534 | ||
|
f82203f371 | ||
|
b9ecccdeab | ||
|
44a555ac2d | ||
|
ec11ff2d5e | ||
|
ec02125396 | ||
|
b5f1a7dd22 | ||
|
1cc1e45673 | ||
|
a2ad2c0e92 | ||
|
1add6a4c8e | ||
|
af78e6480d |
BIN
images/iris-classes.png
Normal file
BIN
images/iris-classes.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.4 MiB |
BIN
images/iris-features.png
Normal file
BIN
images/iris-features.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 944 KiB |
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -79,7 +79,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -40,10 +40,10 @@
|
||||
"\n",
|
||||
"* Learn to use scikit-learn\n",
|
||||
"* Learn the basic steps to apply machine learning techniques: dataset analysis, load, preprocessing, training, validation, optimization and persistence.\n",
|
||||
"* Learn how to do a exploratory data analysis\n",
|
||||
"* Learn how to do an exploratory data analysis\n",
|
||||
"* Learn how to visualise a dataset\n",
|
||||
"* Learn how to load a bundled dataset\n",
|
||||
"* Learn how to separate the dataset into traning and testing datasets\n",
|
||||
"* Learn how to separate the dataset into training and testing datasets\n",
|
||||
"* Learn how to train a classifier\n",
|
||||
"* Learn how to predict with a trained classifier\n",
|
||||
"* Learn how to evaluate the predictions\n",
|
||||
@ -71,7 +71,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## LIcence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -87,7 +87,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Scikit-learn provides algorithms for solving the following problems:\n",
|
||||
"* **Classification**: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, C4.5, ...), kNN, SVM, Random forest, Perceptron, etc. \n",
|
||||
"* **Classification**: Identifying to which category an object belongs. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, C4.5, ...), kNN, SVM, Random forest, Perceptron, etc. \n",
|
||||
"* **Clustering**: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",
|
||||
"* **Regression**: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",
|
||||
"* **Dimensionality reduction**: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."
|
||||
@ -105,7 +105,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In addition, scikit-learn helps in several tasks:\n",
|
||||
"* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
|
||||
"* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
|
||||
"* **Preprocessing**: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."
|
||||
]
|
||||
},
|
||||
@ -128,9 +128,9 @@
|
||||
"\n",
|
||||
"If it is not installed, install it with conda: `conda install scikit-learn`.\n",
|
||||
"\n",
|
||||
"If you have installed scipy and numpy, you can also installed using pip: `pip install -U scikit-learn`.\n",
|
||||
"If you have installed scipy and numpy, you can also install using pip: `pip install -U scikit-learn`.\n",
|
||||
"\n",
|
||||
"It is not recommended to use pip for installing scipy and numpy. Instead, use conda or install the linux package *python-sklearn*."
|
||||
"It is not recommended to use pip to install scipy and numpy. Instead, use conda or install the Linux package *python-sklearn*."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -156,7 +156,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Course Notes for Learning Intelligent Systems\n",
|
||||
"\n",
|
||||
@ -34,11 +34,11 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to read and load a sample dataset.\n",
|
||||
"This notebook aims to learn how to read and load a sample dataset.\n",
|
||||
"\n",
|
||||
"Scikit-learn comes with some bundled [datasets](https://scikit-learn.org/stable/datasets.html): iris, digits, boston, etc.\n",
|
||||
"\n",
|
||||
"In this notebook we are going to use the Iris dataset."
|
||||
"In this notebook, we will use the Iris dataset."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -54,16 +54,25 @@
|
||||
"source": [
|
||||
"The [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), available at [UCI dataset repository](https://archive.ics.uci.edu/ml/datasets/Iris), is a classic dataset for classification.\n",
|
||||
"\n",
|
||||
"The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.\n",
|
||||
"The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.\n",
|
||||
"\n",
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In order to read the dataset, we import the datasets bundle and then load the Iris dataset. "
|
||||
"Here you can see the species and the features.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To read the dataset, we import the datasets bundle and then load the Iris dataset. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -180,7 +189,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Using numpy, I can print the dimensions (here we are working with 2D matriz)\n",
|
||||
"#Using numpy, I can print the dimensions (here we are working with a 2D matrix)\n",
|
||||
"print(iris.data.ndim)"
|
||||
]
|
||||
},
|
||||
@ -218,7 +227,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In following sessions we will learn how to load a dataset from a file (csv, excel, ...) using the pandas library."
|
||||
"In the following sessions, we will learn how to load a dataset from a file (CSV, Excel, ...) using the pandas library."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -246,7 +255,7 @@
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -49,7 +49,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
|
||||
"This notebook aims to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -65,13 +65,13 @@
|
||||
"source": [
|
||||
"This section covers different ways to inspect the distribution of samples per feature.\n",
|
||||
"\n",
|
||||
"First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
|
||||
"First of all, let's see how many samples we have in each class using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
|
||||
"\n",
|
||||
"A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n",
|
||||
"A histogram is a graphical representation of the distribution of numerical data. It estimates the probability distribution of a continuous variable (quantitative variable). \n",
|
||||
"\n",
|
||||
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
|
||||
"For building a histogram, we need to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
|
||||
"\n",
|
||||
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
|
||||
"Since the values are not continuous and we have only three values, we do not need to bin them."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -115,7 +115,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As can be seen, we have the same distribution of samples for every class.\n",
|
||||
"The next step is to see the distribution of the features"
|
||||
"The next step is to see the distribution of the features."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -184,7 +184,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, the Setosa class seems to be linearly separable with these two features.\n",
|
||||
"As we can see, the Setosa class seems linearly separable with these two features.\n",
|
||||
"\n",
|
||||
"Another nice visualisation is given below."
|
||||
]
|
||||
@ -241,7 +241,7 @@
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -52,11 +52,11 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the previous notebook we developed plots with the [matplotlib](http://matplotlib.org/) plotting library.\n",
|
||||
"In the previous notebook, we developed plots with the [matplotlib](http://matplotlib.org/) plotting library.\n",
|
||||
"\n",
|
||||
"This notebook introduces another plotting library, [**seaborn**](https://stanford.edu/~mwaskom/software/seaborn/), which provides advanced facilities for data visualization.\n",
|
||||
"\n",
|
||||
"*Seaborn* is a library for making attractive and informative statistical graphics in Python. It is built on top of *matplotlib* and tightly integrated with the *PyData* stack, including support for *numpy* and *pandas* data structures and statistical routines from *scipy* and *statsmodels*.\n",
|
||||
"*Seaborn* is a library that makes attractive and informative statistical graphics in Python. It is built on top of *matplotlib* and tightly integrated with the *PyData* stack, including support for *numpy* and *pandas* data structures and statistical routines from *scipy* and *statsmodels*.\n",
|
||||
"\n",
|
||||
"*Seaborn* requires its input to be *DataFrames* (a structure created with the library *pandas*)."
|
||||
]
|
||||
@ -197,9 +197,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A very common way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of the three different species of iris flowers.\n",
|
||||
"A widespread way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of the three different species of iris flowers.\n",
|
||||
"\n",
|
||||
"We are going to color each class, so that we can easily identify **clustering** and **linear relationships**."
|
||||
"We are going to color each class, so we can easily identify **clustering** and **linear relationships**."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -220,7 +220,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"By default every numeric column in the dataset is used, but you can focus on particular relationships if you want."
|
||||
"By default, every numeric column in the dataset is used, but you can focus on particular relationships if you want."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -321,7 +321,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# One way we can extend this plot is adding a layer of individual points on top of\n",
|
||||
"# One way we can extend this plot is by adding a layer of individual points on top of\n",
|
||||
"# it through Seaborn's striplot\n",
|
||||
"# \n",
|
||||
"# We'll use jitter=True so that all the points don't fall in single vertical lines\n",
|
||||
@ -347,7 +347,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# A violin plot combines the benefits of the previous two plots and simplifies them\n",
|
||||
"# Denser regions of the data are fatter, and sparser thiner in a violin plot\n",
|
||||
"# Denser regions of the data are fatter, and sparser thinner in a violin plot\n",
|
||||
"sns.violinplot(x=\"species\", y=\"petal length (cm)\", data=iris_df, size=6)"
|
||||
]
|
||||
},
|
||||
@ -389,10 +389,10 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Depending on the data, we can choose which visualisation suits better. the following [diagram](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/) guides this selection.\n",
|
||||
"Depending on the data, we can choose which visualisation suits us better. the following [diagram](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/) guides this selection.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -421,7 +421,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -76,7 +76,7 @@
|
||||
"source": [
|
||||
"A common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the **training set** on which we learn data properties and one that we call the **testing set** on which we test these properties. \n",
|
||||
"\n",
|
||||
"We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
|
||||
"We will use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -122,9 +122,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Standardization of datasets is a common requirement for many machine learning estimators implemented in the scikit; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.\n",
|
||||
"Standardization of datasets is a common requirement for many machine learning estimators implemented in the scikit; they might misbehave if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.\n",
|
||||
"\n",
|
||||
"The preprocessing module further provides a utility class `StandardScaler` to compute the mean and standard deviation on a training set. Later, the same transformation will be applied on the testing set."
|
||||
"The preprocessing module further provides a utility class `StandardScaler` to compute a training set's mean and standard deviation. Later, the same transformation will be applied on the testing set."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -173,7 +173,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Licences\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -53,9 +53,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This is an introduction of general ideas about machine learning and the interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
|
||||
"This is an introduction to general ideas about machine learning and the interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
|
||||
"\n",
|
||||
"You can skip it during the lab session and read it later,"
|
||||
"You can skip it during the lab session and read it later."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -69,21 +69,21 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Machine learning algorithms are programs that learn a model from a dataset with the aim of making predictions or learning structures to organize the data.\n",
|
||||
"Machine learning algorithms are programs that learn a model from a dataset to make predictions or learn structures to organize the data.\n",
|
||||
"\n",
|
||||
"In scikit-learn, machine learning algorithms take as an input a *numpy* array (n_samples, n_features), where\n",
|
||||
"* **n_samples**: number of samples. Each sample is an item to process (i.e. classify). A sample can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
|
||||
"* **n_features**: The number of features or distinct traits that can be used to describe each item in a quantitative manner.\n",
|
||||
"In scikit-learn, machine learning algorithms take as input a *numpy* array (n_samples, n_features), where\n",
|
||||
"* **n_samples**: number of samples. Each sample is an item to process (i.e., classify). A sample can be a document, a picture, a sound, a video, a row in a database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
|
||||
"* **n_features**: The number of features or distinct traits that can be used to describe each item quantitatively.\n",
|
||||
"\n",
|
||||
"The number of features should be defined in advance. There is a specific type of feature sets that are high dimensional (e.g. millions of features), but most of the values are zero for a given sample. Using (numpy) arrays, all those values that are zero would also take up memory. For this reason, these feature sets are often represented with sparse matrices (scipy.sparse) instead of (numpy) arrays.\n",
|
||||
"The number of features should be defined in advance. A specific type of feature set is high-dimensional (e.g., millions of features), but most values are zero for a given sample. Using (numpy) arrays, all those zero values would also take up memory. For this reason, these feature sets are often represented with sparse matrices (scipy.sparse) instead of (numpy) arrays.\n",
|
||||
"\n",
|
||||
"The first step in machine learning is **identifying the relevant features** from the input data, and the second step is **extracting the features** from the input data. \n",
|
||||
"\n",
|
||||
"[Machine learning algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) can be classified according to learning style into:\n",
|
||||
"* **Supervised learning**: input data (training dataset) has a known label or result. Example problems are classification and regression. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.\n",
|
||||
"* **Unsupervised learning**: input data is not labeled. A model is prepared by deducing structures present in the input data. This may be to extract general rules. Example problems are clustering, dimensionality reduction and association rule learning.\n",
|
||||
"* **Semi-supervised learning**:i nput data is a mixture of labeled and unlabeled examples. There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression."
|
||||
]
|
||||
"* **Unsupervised learning**: input data is not labeled. A model is prepared by deducing structures present in the input data. This may be to extract general rules. Example problems are clustering, dimensionality reduction, and association rule learning.\n",
|
||||
"* **Semi-supervised learning**: input data is a mixture of labeled and unlabeled examples. There is a desired prediction problem, but the model must learn the structures to organize the data and make predictions. Example problems are classification and regression."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@ -96,8 +96,8 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In *supervised machine learning models*, the machine learning algorithm takes as an input a training dataset, composed of feature vectors and labels, and produces a predictive model which is used for make prediction on new data.\n",
|
||||
""
|
||||
"In *supervised machine learning models*, the machine learning algorithm takes as input a training dataset, composed of feature vectors and labels, and produces a predictive model used to predict new data.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -111,7 +111,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In *unsupervised machine learning models*, the machine learning model algorithm takes as an input the feature vectors and produces a predictive model that is used to fit its parameters so as to best summarize regularities found in the data.\n",
|
||||
"In *unsupervised machine learning models*, the machine learning model algorithm takes as input the feature vectors. It produces a predictive model that is used to fit its parameters to summarize the best regularities found in the data.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
@ -129,15 +129,15 @@
|
||||
"scikit-learn has a uniform interface for all the estimators, some methods are only available if the estimator is supervised or unsupervised:\n",
|
||||
"\n",
|
||||
"* Available in *all estimators*:\n",
|
||||
" * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
|
||||
" * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g., model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
|
||||
"\n",
|
||||
"* Available in *supervised estimators*:\n",
|
||||
" * **model.predict()**: given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.\n",
|
||||
" * **model.predict()**: given a trained model, predict the label of a new dataset. This method accepts one argument, the new data X_new (e.g., model.predict(X_new)), and returns the learned label for each object in the array.\n",
|
||||
" * **model.predict_proba()**: For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict().\n",
|
||||
"\n",
|
||||
"* Available in *unsupervised estimators*:\n",
|
||||
" * **model.transform()**: given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model.\n",
|
||||
" * **model.fit_transform()**: some estimators implement this method, which performs a fit and a transform on the same input data.\n",
|
||||
" * **model.fit_transform()**: Some estimators implement this method, which performs a fit and a transform on the same input data.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
""
|
||||
@ -169,7 +169,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -55,7 +55,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to train a model, make predictions with that model and evaluate these predictions.\n",
|
||||
"The goal of this notebook is to learn how to train a model, make predictions with that model, and evaluate these predictions.\n",
|
||||
"\n",
|
||||
"The notebook uses the [kNN (k nearest neighbors) algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)."
|
||||
]
|
||||
@ -212,14 +212,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Precision, recall and f-score"
|
||||
"### Precision, recall, and f-score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
|
||||
"\n",
|
||||
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
|
||||
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
|
||||
@ -246,7 +246,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Another useful metric is the confusion matrix"
|
||||
"Another useful metric is the confusion matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -262,7 +262,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We see we classify well all the 'setosa' and 'versicolor' samples. "
|
||||
"We classify all the 'setosa' and 'versicolor' samples well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -276,7 +276,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**."
|
||||
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -298,7 +298,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"print(scores)"
|
||||
]
|
||||
@ -307,7 +307,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -340,7 +340,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We are going to tune the algorithm, and calculate which is the best value for the k hyperparameter."
|
||||
"We will tune the algorithm and calculate the best value for the k hyperparameter."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -365,7 +365,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The result is very dependent of the input data. Execute again the train_test_split and test again how the result changes with k."
|
||||
"The result is very dependent on the input data. Execute the train_test_split again and test how the result changes with k."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -387,7 +387,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -56,9 +56,9 @@
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
|
||||
"\n",
|
||||
"There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
|
||||
"There are several well-known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0, and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
|
||||
"\n",
|
||||
"This notebook will follow the same steps that the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
|
||||
"This notebook will follow the same steps as the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
|
||||
"\n",
|
||||
"You need to install pydotplus: `conda install pydotplus` for the visualization."
|
||||
]
|
||||
@ -69,7 +69,7 @@
|
||||
"source": [
|
||||
"## Load data and preprocessing\n",
|
||||
"\n",
|
||||
"Here we repeat the same operations for loading data and preprocessing than in the previous notebooks."
|
||||
"Here we repeat the same operations for loading data and preprocessing as in the previous notebooks."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -262,8 +262,8 @@
|
||||
"The current version of pydot does not work well in Python 3.\n",
|
||||
"For obtaining an image, you need to install `pip install pydotplus` and then `conda install graphviz`.\n",
|
||||
"\n",
|
||||
"You can skip this example. Since it can require installing additional packages, we include here the result.\n",
|
||||
""
|
||||
"You can skip this example. Since it can require installing additional packages, we have included the result here.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -330,7 +330,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next we are going to export the pseudocode of the the learnt decision tree."
|
||||
"Next, we will export the pseudocode of the learnt decision tree."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -378,14 +378,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Precision, recall and f-score"
|
||||
"### Precision, recall, and f-score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
|
||||
"\n",
|
||||
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
|
||||
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
|
||||
@ -412,7 +412,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Another useful metric is the confusion matrix"
|
||||
"Another useful metric is the confusion matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -428,7 +428,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We see we classify well all the 'setosa' and 'versicolor' samples. "
|
||||
"We classify all the 'setosa' and 'versicolor' samples well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -442,7 +442,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
|
||||
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
|
||||
"\n",
|
||||
"Sklearn comes with other strategies for [cross validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation), such as stratified K-fold, label k-fold, Leave-One-Out, Leave-P-Out, Leave-One-Label-Out, Leave-P-Label-Out or Shuffle & Split."
|
||||
]
|
||||
@ -466,7 +466,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"print(scores)"
|
||||
]
|
||||
@ -475,7 +475,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -518,7 +518,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -58,7 +58,7 @@
|
||||
"source": [
|
||||
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
|
||||
"\n",
|
||||
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search."
|
||||
"This notebook aims to learn how to tune an algorithm by optimizing its hyperparameters using grid search."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -137,7 +137,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"\n",
|
||||
"from scipy.stats import sem\n",
|
||||
@ -189,7 +189,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can get the list of parameters of the model. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
|
||||
"We can get the list of model parameters. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -205,7 +205,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's see what happens if we change a parameter"
|
||||
"Let's see what happens if we change a parameter."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -284,7 +284,7 @@
|
||||
"\n",
|
||||
"Look at the [API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of *scikit-learn* to understand better the algorithm, as well as which parameters can be tuned. As you see, we can change several ones, such as *criterion*, *splitter*, *max_features*, *max_depth*, *min_samples_split*, *class_weight*, etc.\n",
|
||||
"\n",
|
||||
"We can get the full list parameters of an estimator with the method *get_params()*. "
|
||||
"We can get an estimator's full list of parameters with the method *get_params()*. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -314,16 +314,16 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n",
|
||||
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider finding the optimal value of the hyperparameters as an *optimization problem*. \n",
|
||||
"\n",
|
||||
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
|
||||
"Sklearn has several optimization techniques, such as **grid search** and **randomized search**. In this notebook, we are going to introduce the former one."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
|
||||
"Sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -351,7 +351,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we are going to show the results of grid search"
|
||||
"Now we are going to show the results of the grid search"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -392,7 +392,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"def mean_score(scores):\n",
|
||||
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
||||
@ -405,7 +405,7 @@
|
||||
"source": [
|
||||
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
|
||||
"\n",
|
||||
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
|
||||
"We are now trying to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -492,7 +492,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"def mean_score(scores):\n",
|
||||
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
||||
@ -533,7 +533,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -48,9 +48,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to save a model in the the scikit by using Python’s built-in persistence model, namely pickle\n",
|
||||
"The goal of this notebook is to learn how to save a model in the scikit by using Python’s built-in persistence model, namely pickle\n",
|
||||
"\n",
|
||||
"First we recap the previous tasks: load data, preprocess and train the model."
|
||||
"First, we recap the previous tasks: load data, preprocess, and train the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -107,7 +107,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case the model can only be saved to a file and not to a string."
|
||||
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case, the model can only be saved to a file and not to a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -146,7 +146,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -52,7 +52,7 @@
|
||||
"\n",
|
||||
"Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.\n",
|
||||
"\n",
|
||||
"The plots show training points in solid colors and testing points semi-transparent. The lower right shows the classification accuracy on the test set.\n",
|
||||
"The plots show training points in solid colors and testing points in semi-transparent colors. The lower right shows the classification accuracy on the test set.\n",
|
||||
"\n",
|
||||
"The [DummyClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html#sklearn.dummy.DummyClassifier) is a classifier that makes predictions using simple rules. It is useful as a simple baseline to compare with other (real) classifiers. \n",
|
||||
"\n",
|
||||
@ -94,7 +94,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
BIN
ml1/images/iris-classes.png
Normal file
BIN
ml1/images/iris-classes.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.4 MiB |
BIN
ml1/images/iris-features.png
Normal file
BIN
ml1/images/iris-features.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 944 KiB |
BIN
ml2/images/iris-classes.png
Normal file
BIN
ml2/images/iris-classes.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.4 MiB |
BIN
ml2/images/iris-features.png
Normal file
BIN
ml2/images/iris-features.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 944 KiB |
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -27,14 +27,14 @@
|
||||
"source": [
|
||||
"# Introduction to Neural Networks\n",
|
||||
" \n",
|
||||
"In this lab session, we are going to learn how to train a neural network.\n",
|
||||
"In this lab session, we will learn how to train a neural network.\n",
|
||||
"\n",
|
||||
"# Objectives\n",
|
||||
"\n",
|
||||
"The main objectives of this session are:\n",
|
||||
"* Put in practice the notions learn in class about neural computing\n",
|
||||
"* Understand what an MLP is\n",
|
||||
"* Learn to use some libraries, such as scikit-learn "
|
||||
"* Learn to use some libraries, such as Scikit-learn."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -58,7 +58,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -39,7 +39,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Multilayer perceptrons, also called feedforward neural networks or deep feedforward networks, are the most basic deep learning models."
|
||||
"Multilayer perceptrons, called feedforward neural networks or deep feedforward networks, are the most basic deep learning models."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -58,7 +58,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this notebook we are going to try the spiral dataset with different algorthms. In particular, we are going to focus our attention on the MLP classifier.\n",
|
||||
"In this notebook, we will try the spiral dataset with different algorithms. In particular, we are going to focus our attention on the MLP classifier.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Answer directly in your copy of the exercise and submit it as a moodle task."
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -39,10 +39,10 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this notebook we are going to apply a MLP to a simple regression task: learning the Fresnel functions.\n",
|
||||
"In this notebook, we are going to apply an MLP to a simple regression task: learning the Fresnel functions.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Answer directly in your copy of the exercise and submit it as a moodle task."
|
||||
"Answer directly in your copy of the exercise and submit it as a Moodle task."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -92,7 +92,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Change this variables to change the train and test dataset."
|
||||
"Change these variables to change the train and test dataset."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -15,7 +15,7 @@ def gen_spiral_dataset(n_examples=500, n_classes=2, a=None, b=None, pi_space=3):
|
||||
theta = np.linspace(0,pi_space*pi, num=n_examples)
|
||||
xy = np.zeros((n_examples,2))
|
||||
|
||||
# logaritmic spirals
|
||||
# logarithmic spirals
|
||||
x_golden_parametric = lambda a, b, theta: a**(theta*b) * cos(theta)
|
||||
y_golden_parametric = lambda a, b, theta: a**(theta*b) * sin(theta)
|
||||
x_golden_parametric = np.vectorize(x_golden_parametric)
|
||||
|
Loading…
x
Reference in New Issue
Block a user