sitc/ml2/3_6_Machine_Learning.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/EscUpmPolit_p.gif \"UPM\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Course Notes for Learning Intelligent Systems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous session, we learnt how to apply machine learning algorithms to the Iris dataset.\n",
    "\n",
    "We are going now to review the full process. As probably you have notice, data preparation, cleaning and transformation takes more than 90 % of data mining effort.\n",
    "\n",
    "The phases are:\n",
    "\n",
    "* **Data ingestion**: reading the data from the data lake\n",
    "* **Preprocessing**: \n",
    "    * **Data cleaning (munging)**:  fill missing values, smooth noisy data (binning methods), identify or remove outlier, and resolve inconsistencies \n",
    "    * **Data integration**: Integrate multiple datasets\n",
    "    * **Data transformation**: normalization (rescale numeric values between 0 and 1), standardisation (rescale values to have mean of 0 and std of 1), transformation for smoothing a variable (e.g. square toot, ...), aggregation of data from several datasets\n",
    "    * **Data reduction**: dimensionality reduction, clustering and sampling. \n",
    "    * **Data discretization**: for numerical values and algorithms that do not accept continuous variables\n",
    "    * **Feature engineering**: selection of most relevant features, creation of new features and delete non relevant features\n",
    "    * Apply  Sampling for dividing the dataset into training and test datasets.\n",
    "* **Machine learning**: apply machine learning algorithms and obtain an estimator, tuning its parameters.\n",
    "* **Evaluation** of the model\n",
    "* **Prediction**: use the model for new data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "![Machine Learning Process from *Python Machine Learning* book](images/machine-learning-process.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Licence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* [Python Machine Learning](http://proquest.safaribooksonline.com/book/programming/python/9781783555130), Sebastian Raschka, Packt Publishing, 2015."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Licence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n",
    "\n",
    "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}