"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Introduction to scikit-learn](#Introduction-to-scikit-learn)\n",
"* [What is scikit-learn?](#What-is-scikit-learn?)\n",
"* [Problems that scikit-learn can solve](#Problems-that-scikit-learn-can-solve)\n",
"* [Helpers for Machine Learning](#Helpers-for-Machine-Learning)\n",
"* [How to install scikit-learn](#How-to-install-scikit-learn)\n",
"* [References](#References)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to scikit-learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lecture provides a quick introduction to [scikit-learn](http://scikit-learn.org/stable/), a Python library for machine learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is scikit-learn?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit-Learn is a Python library that provides a wealth of machine learning algorithms. \n",
"\n",
"The library is built upon SciPy (Scientific Python) that should be installed before using scikit-learn.\n",
"\n",
"In particular, scikit-learn uses:\n",
"* **NumPy**: package for managing n-dimensional arrays (http://www.numpy.org/)\n",
"* **pandas**: data analysis toolkit (http://pandas.pydata.org/pandas-docs/stable/index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problems that scikit-learn can solve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit-learn provides algorithms for solving the following problems:\n",
"* **Classification**: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, kNN, ...), SVM, Random forest, Perceptron, etc. \n",
"* **Clustering**: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",
"* **Regression**: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",
"* ** Dimensionality reduction**: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helpers for Machine Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition, scikit-learn helps in several tasks:\n",
"* ** Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
"* ** Preprocessing**: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to install scikit-learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you installed the conda distribution, scikit-learn is already installed! This is the best option.\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",