1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-12-22 11:48:12 +00:00
sitc/ml1/2_1_Intro_ScikitLearn.ipynb
Carlos A. Iglesias 5febbc21a4
Update 2_1_Intro_ScikitLearn.ipynb
Errata en dimensionality.
2022-02-21 12:22:15 +01:00

204 lines
6.7 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Introduction to scikit-learn](#Introduction-to-scikit-learn)\n",
"* [What is scikit-learn?](#What-is-scikit-learn?)\n",
"* [Problems that scikit-learn can solve](#Problems-that-scikit-learn-can-solve)\n",
"* [Helpers for Machine Learning](#Helpers-for-Machine-Learning)\n",
"* [How to install scikit-learn](#How-to-install-scikit-learn)\n",
"* [References](#References)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to scikit-learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lecture provides a quick introduction to [scikit-learn](http://scikit-learn.org/stable/), a Python library for machine learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is scikit-learn?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit-Learn is a Python library that provides a wealth of machine learning algorithms. \n",
"\n",
"The library is built upon SciPy (Scientific Python) that should be installed before using scikit-learn.\n",
"\n",
"In particular, scikit-learn uses:\n",
"* **NumPy**: package for managing n-dimensional arrays (http://www.numpy.org/)\n",
"* **pandas**: data analysis toolkit (http://pandas.pydata.org/pandas-docs/stable/index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problems that scikit-learn can solve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit-learn provides algorithms for solving the following problems:\n",
"* **Classification**: Identifying to which category an object belongs to. Some of the available [classification algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are decision trees (ID3, C4.5, ...), kNN, SVM, Random forest, Perceptron, etc. \n",
"* **Clustering**: Automatic grouping of similar objects into sets. Some of the available [clustering algorithms](http://scikit-learn.org/stable/modules/clustering.html#clustering) are k-Means, Affinity propagation, etc.\n",
"* **Regression**: Predicting a continuous-valued attribute associated with an object. Some of the available [regression algorithms](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) are linear regression, logistic regression, etc.\n",
"* **Dimensionality reduction**: Reducing the number of random variables to consider. Some of the available [dimensionality reduction algorithms](http://scikit-learn.org/stable/modules/decomposition.html#decompositions) are SVD, PCA, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helpers for Machine Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition, scikit-learn helps in several tasks:\n",
"* **Model selection**: Comparing, validating, choosing parameters and models, and persisting models. Some of the [available functionalities](http://scikit-learn.org/stable/model_selection.html#model-selection) are cross-validation or grid search for optimizing the parameters. \n",
"* **Preprocessing**: Several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. Some of the available [preprocessing functions](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) are scaling and normalizing data, or imputing missing values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to install scikit-learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you installed the conda distribution, scikit-learn is already installed! This is the best option.\n",
"\n",
"Anyway, before starting, update all the packages: `conda update --all`. \n",
"\n",
"In case it is an old installation, you can update it using conda: `conda update scikit-learn`.\n",
"\n",
"If it is not installed, install it with conda: `conda install scikit-learn`.\n",
"\n",
"If you have installed scipy and numpy, you can also installed using pip: `pip install -U scikit-learn`.\n",
"\n",
"It is not recommended to use pip for installing scipy and numpy. Instead, use conda or install the linux package *python-sklearn*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Scikit-learn site](http://scikit-learn.org/stable/index.html)\n",
"* [How to install Scikit-learn](http://scikit-learn.org/stable/install.html/)\n",
"* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n",
"* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}