sitc/ml1/2_1_Intro_ScikitLearn.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](files/images/EscUpmPolit_p.gif \"UPM\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Course Notes for Learning Intelligent Systems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, ©  Carlos A. Iglesias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "* [Introduction to scikit-learn](#Introduction-to-scikit-learn)\n",
    "* [What is scikit-learn?](#What-is-scikit-learn?)\n",
    "* [Problems that scikit-learn can solve](#Problems-that-scikit-learn-can-solve)\n",
    "* [Helpers for Machine Learning](#Helpers-for-Machine-Learning)\n",
    "* [How to install scikit-learn](#How-to-install-scikit-learn)\n",
    "* [References](#References)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to scikit-learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This lecture provides a quick introduction to [scikit-learn](http://scikit-learn.org/stable/), a Python library for machine learning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is scikit-learn?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scikit-Learn is a Python library that provides a wealth of machine learning algorithms. \n",
    "\n",
    "The library is built upon SciPy (Scientific Python) that should be installed before using scikit-learn.\n",
    "\n",
    "In particular, scikit-learn uses:\n",
    "* **NumPy**: package for managing n-dimensional arrays (http://www.numpy.org/)\n",
    "* **pandas**: data analysis toolkit (http://pandas.pydata.org/pandas-docs/stable/index.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems that scikit-learn can solve"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scikit-learn provides algorithms for solving the following problems:\n",
    "* **Classification**: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, kNN, ...), SVM, Random forest, Perceptron, etc. \n",
    "* **Clustering**: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",
    "* **Regression**: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",
    "* ** Dimensionality reduction**: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Helpers for Machine Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition, scikit-learn helps in several tasks:\n",
    "* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
    "* **Preprocessing**: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to install scikit-learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you installed the conda distribution, scikit-learn is already installed! This is the best option.\n",
    "\n",
    "Anyway, before starting, update all the packages: `conda update --all`. \n",
    "\n",
    "In case it is an old installation, you can update it using conda: `conda update scikit-learn`.\n",
    "\n",
    "If it is not installed, install it with conda: `conda install scikit-learn`.\n",
    "\n",
    "If you have installed scipy and numpy, you can also installed using pip: `pip install -U scikit-learn`.\n",
    "\n",
    "It is not recommended to use pip for installing scipy and numpy. Instead, use conda or install the linux package *python-sklearn*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* [Scikit-learn site](http://scikit-learn.org/stable/index.html)\n",
    "* [How to install Scikit-learn](http://scikit-learn.org/stable/install.html/)\n",
    "* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n",
    "* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Licence\n",
    "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n",
    "\n",
    "© Carlos A. Iglesias, Universidad Politécnica de Madrid."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
Add ml1 2016-03-15 12:55:14 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"![](files/images/EscUpmPolit_p.gif \"UPM\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Course Notes for Learning Intelligent Systems"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
updated notebooks 2019-02-28 10:32:00 +00:00			`"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"`
Add ml1 2016-03-15 12:55:14 +00:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Table of Contents\n",`
			`"* [Introduction to scikit-learn](#Introduction-to-scikit-learn)\n",`
			`"* [What is scikit-learn?](#What-is-scikit-learn?)\n",`
			`"* [Problems that scikit-learn can solve](#Problems-that-scikit-learn-can-solve)\n",`
			`"* [Helpers for Machine Learning](#Helpers-for-Machine-Learning)\n",`
			`"* [How to install scikit-learn](#How-to-install-scikit-learn)\n",`
			`"* [References](#References)\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Introduction to scikit-learn"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"This lecture provides a quick introduction to [scikit-learn](http://scikit-learn.org/stable/), a Python library for machine learning."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## What is scikit-learn?"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Scikit-Learn is a Python library that provides a wealth of machine learning algorithms. \n",`
			`"\n",`
			`"The library is built upon SciPy (Scientific Python) that should be installed before using scikit-learn.\n",`
			`"\n",`
			`"In particular, scikit-learn uses:\n",`
			`"* NumPy: package for managing n-dimensional arrays (http://www.numpy.org/)\n",`
			`"* pandas: data analysis toolkit (http://pandas.pydata.org/pandas-docs/stable/index.html)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Problems that scikit-learn can solve"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Scikit-learn provides algorithms for solving the following problems:\n",`
			`"* Classification: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, kNN, ...), SVM, Random forest, Perceptron, etc. \n",`
			`"* Clustering: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",`
			`"* Regression: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",`
			`"* Dimensionality reduction: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Helpers for Machine Learning"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"In addition, scikit-learn helps in several tasks:\n",`
updated notebooks 2019-02-28 10:32:00 +00:00			`"* Model selection: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",`
			`"* Preprocessing: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."`
Add ml1 2016-03-15 12:55:14 +00:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## How to install scikit-learn"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"If you installed the conda distribution, scikit-learn is already installed! This is the best option.\n",`
			`"\n",`
added update -all 2019-02-28 11:26:33 +00:00			"Anyway, before starting, update all the packages: `conda update --all`. \n",
			`"\n",`
Review J 2016-03-28 10:26:20 +00:00			"In case it is an old installation, you can update it using conda: `conda update scikit-learn`.\n",
Add ml1 2016-03-15 12:55:14 +00:00			`"\n",`
			"If it is not installed, install it with conda: `conda install scikit-learn`.\n",
			`"\n",`
			"If you have installed scipy and numpy, you can also installed using pip: `pip install -U scikit-learn`.\n",
			`"\n",`
			`"It is not recommended to use pip for installing scipy and numpy. Instead, use conda or install the linux package python-sklearn."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## References\n",`
			`"\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"* [Scikit-learn site](http://scikit-learn.org/stable/index.html)\n",`
			`"* [How to install Scikit-learn](http://scikit-learn.org/stable/install.html/)\n",`
			`"* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n",`
			`"* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Licence\n",`
			`"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",`
			`"\n",`
updated notebooks 2019-02-28 10:32:00 +00:00			`"© Carlos A. Iglesias, Universidad Politécnica de Madrid."`
Add ml1 2016-03-15 12:55:14 +00:00			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
Fix sklearn.model_selection. Remove output 2019-02-28 14:25:19 +00:00			`"version": "3.6.7"`
updated notebooks 2019-02-28 10:32:00 +00:00			`},`
			`"latex_envs": {`
			`"LaTeX_envs_menu_present": true,`
			`"autocomplete": true,`
			`"bibliofile": "biblio.bib",`
			`"cite_by": "apalike",`
			`"current_citInitial": 1,`
			`"eqLabelWithNumbers": true,`
			`"eqNumInitial": 1,`
			`"hotkeys": {`
			`"equation": "Ctrl-E",`
			`"itemize": "Ctrl-I"`
			`},`
			`"labels_anchors": false,`
			`"latex_user_defs": false,`
			`"report_style_numbering": false,`
			`"user_envs_cfg": false`
Add ml1 2016-03-15 12:55:14 +00:00			`}`
			`},`
			`"nbformat": 4,`
updated notebooks 2019-02-28 10:32:00 +00:00			`"nbformat_minor": 1`
Add ml1 2016-03-15 12:55:14 +00:00			`}`