mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-24 23:42:29 +00:00
123 lines
3.6 KiB
Plaintext
123 lines
3.6 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"![](images/EscUpmPolit_p.gif \"UPM\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Course Notes for Learning Intelligent Systems"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Machine Learning"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In the previous session, we learnt how to apply machine learning algorithms to the Iris dataset.\n",
|
|
"\n",
|
|
"We are going now to review the full process. As probably you have notice, data preparation, cleaning and transformation takes more than 90 % of data mining effort.\n",
|
|
"\n",
|
|
"The phases are:\n",
|
|
"\n",
|
|
"* **Data ingestion**: reading the data from the data lake\n",
|
|
"* **Preprocessing**: \n",
|
|
" * **Data cleaning (munging)**: fill missing values, smooth noisy data (binning methods), identify or remove outlier, and resolve inconsistencies \n",
|
|
" * **Data integration**: Integrate multiple datasets\n",
|
|
" * **Data transformation**: normalization (rescale numeric values between 0 and 1), standardisation (rescale values to have mean of 0 and std of 1), transformation for smoothing a variable (e.g. square toot, ...), aggregation of data from several datasets\n",
|
|
" * **Data reduction**: dimensionality reduction, clustering and sampling. \n",
|
|
" * **Data discretization**: for numerical values and algorithms that do not accept continuous variables\n",
|
|
" * **Feature engineering**: selection of most relevant features, creation of new features and delete non relevant features\n",
|
|
" * Apply Sampling for dividing the dataset into training and test datasets.\n",
|
|
"* **Machine learning**: apply machine learning algorithms and obtain an estimator, tuning its parameters.\n",
|
|
"* **Evaluation** of the model\n",
|
|
"* **Prediction**: use the model for new data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"![Machine Learning Process from *Python Machine Learning* book](images/machine-learning-process.jpg)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Licence"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"* [Python Machine Learning](http://proquest.safaribooksonline.com/book/programming/python/9781783555130), Sebastian Raschka, Packt Publishing, 2015."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Licence"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
|
"\n",
|
|
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.5.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
}
|