mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-14 10:32:29 +00:00
202 lines
7.9 KiB
Plaintext
202 lines
7.9 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Course Notes for Learning Intelligent Systems"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Table of Contents\n",
|
||
|
"\n",
|
||
|
"* [Machine Learning](#Machine-Learning)\n",
|
||
|
"* [Machine learning algorithms](#Machine-learning-algorithms)\n",
|
||
|
"\t\t* [Supervised machine learning model](#Supervised-machine-learning-model)\n",
|
||
|
"\t\t* [Unsupervised machine learning model](#Unsupervised-machine-learning-model)\n",
|
||
|
"* [sklearn interface](#sklearn-interface)\n",
|
||
|
"* [References](#References)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Machine Learning"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"This is an introduction of general ideas about machine learning and the general interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
|
||
|
"\n",
|
||
|
"You can skip it during the lab session and read it later,"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Machine learning algorithms"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Machine learning algorithms are programs that learn a model from a dataset with the aim of making predictions or learning structures to organize the data.\n",
|
||
|
"\n",
|
||
|
"In scikit-learn, machine learning algorithms take as an input a *numpy* array (n_samples, n_features), where\n",
|
||
|
"* **n_samples**: number of samples. Each sample is an item to process (i.e. classify). A sample can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
|
||
|
"* **n_features**: The number of features or distinct traits that can be used to describe each item in a quantitative manner.\n",
|
||
|
"\n",
|
||
|
"The number of features should be defined in advanced and it can be very high dimensional (e.g. millions of features) with most of them being zeros for a given sample. In this case we may use (scipy.sparse) sparse matrices instead of (numpy) arrays so as to make the data fit in memory.\n",
|
||
|
"\n",
|
||
|
"The first step in machine learning is **identifying the relevant features** from the input data, and the second step is **extracting the features** from the input data. \n",
|
||
|
"\n",
|
||
|
"[Machine learning algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) can be classified according to learning style into:\n",
|
||
|
"* **Supervised learning**: input data (training dataset) has a known label or result. Example problems are classification and regression. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.\n",
|
||
|
"* **Unsupervised learning**: input data is not labeled. A model is prepared by deducing structures present in the input data. This may be to extract general rules. Example problems are clustering, dimensionality reduction and association rule learning.\n",
|
||
|
"* **Semi-supervised learning**:i nput data is a mixture of labeled and unlabeled examples. There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Supervised machine learning model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In *supervised machine learning models*, the machine learning algorithm takes as an input a training dataset, composed of feature vectors and labels, and produces a predictive model which is used for make prediction on new data.\n",
|
||
|
"![](files/images/plot_ML_flow_chart_1.png)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Unsupervised machine learning model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In *unsupervised machine learning models*, the machine learning model algorithm takes as an input the feature vectors and produces a predictive model that is used to fit its parameters so as to best summarize regularities found in the data.\n",
|
||
|
"![](files/images/plot_ML_flow_chart_3.png)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## sklearn interface"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"scikit-learn has a uniform interface for all the estimators, some methods are only available is the estimator is supervised or unsupervised:\n",
|
||
|
"\n",
|
||
|
"* Available in *all estimators*:\n",
|
||
|
" * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
|
||
|
"\n",
|
||
|
"* Available in *supervised estimators*:\n",
|
||
|
" * **model.predict()**: given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.\n",
|
||
|
" * **model.predict_proba()**: For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict().\n",
|
||
|
"\n",
|
||
|
"* Available in *unsupervised estimators*:\n",
|
||
|
" * **model.transform()**: given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model.\n",
|
||
|
" * **model.fit_transform()**: some estimators implement this method, which performs a fit and a transform on the same input data.\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"![](files/images/plot_ML_flow_chart_2.png)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"source": [
|
||
|
"## References"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"* [General concepts of machine learning with scikit-learn](http://www.astroml.org/sklearn_tutorial/general_concepts.html)\n",
|
||
|
"* [A Tour of Machine Learning Algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Licence"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||
|
"\n",
|
||
|
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.5.1"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 0
|
||
|
}
|