2016-03-15 12:55:14 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Visualisation](#Visualisation)\n",
"* [Exploratory visualisation](#Exploratory-visualisation)\n",
"* [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we are going to inspect the distribution of the samples per feature."
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 2,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# library for displaying plots\n",
"import matplotlib.pyplot as plt\n",
"# display plots in the notebook\n",
"# if this is not set, you will not see the graphic here\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we are going to analyse the [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
"\n",
"A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). \n",
"\n",
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"\n",
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 4,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f8c1589b0>"
2016-03-15 12:55:14 +00:00
]
},
2016-03-15 15:12:44 +00:00
"execution_count": 4,
2016-03-15 12:55:14 +00:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEPCAYAAABCyrPIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE8NJREFUeJzt3X+UZ3V93/HnawGVQYTVlsUTBKMoNlaLNkETOLKNGDVW\n11hBrVo1Nk1y0so59ngEq4HWNkis+WHtyYmGNKuJUeNJAAMWJGRNMRWIQMFfW4iKlcBoXCXLYBTc\nd//4fpYdxp2Z+92d+713d56Pc+6Ze+/3fu99z/d8vvOa++tzU1VIkrRh6AIkSeNgIEiSAANBktQY\nCJIkwECQJDUGgiQJgEP73kCSrwB3A7uA+6rqlCQbgQ8DJwBfAc6qqrv7rkWStLxZ7CHsAjZX1dOq\n6pQ27xzgqqo6CbgaOHcGdUiSVjCLQMhetrMF2NrGtwIvnkEdkqQVzCIQCvhEkuuT/Os2b1NVzQNU\n1V3AMTOoQ5K0gt7PIQCnVtWdSf4hcGWS7UxCYjH7z5CkgfUeCFV1Z/v5jSQXA6cA80k2VdV8kmOB\nr+/tvUkMCknaB1WVad/T6yGjJHNJHt7GjwB+CrgFuBR4bVvsNcAly6+lBhy+ydzcRqrqoBjOO++8\nwWs4WIbh2+bugcE/i7UYDpa2OaZ2sS/63kPYBPxJ+0//UOAPqurKJH8FfCTJzwK3A2f1XIckaRW9\nBkJVfRk4eS/zdwBn9LltSdJ0vFN5Hdm8efPQJUh7Zdsch+w57jU+k0NNQ9a3g7m5E1lY2DFgDRqj\nJIzj4rgw5u/wejOydjGuk8qSpAOHgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2B\nIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBA\nkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCZhQISTYk\nuSHJpW16Y5Irk2xPckWSo2ZRhyRpebPaQzgb+Pyi6XOAq6rqJOBq4NwZ1SFJWkbvgZDkOOCngd9Z\nNHsLsLWNbwVe3HcdkqSVzWIP4deBNwG1aN6mqpoHqKq7gGNmUIckaQWH9rnyJC8A5qvqpiSbV1i0\nln/p/EXjm9sgSdpjWxv2T6pW+Fu8vytPfgV4FXA/cDhwJPAnwI8Cm6tqPsmxwJ9X1T/ay/trxazo\n3Q7m5k5kYWHHgDVojJIwbNvcLfT5HdZ0RtYuMu27ej1kVFVvqarjq+pxwMuBq6vq1cDHgNe2xV4D\nXNJnHZKk1Q11H8I7gOck2Q48u01LkgbU6yGj/eUhI43VyA4NDF2EmpG1i3EdMpIkHTgMBEkSYCBI\nkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSGgNBkgQYCJKkxkCQJAEGgiSpMRAk\nSYCBIElqVg2EJGcmObKNvzXJHyd5ev+lSZJmqcsewtuqameS04AzgIuA3+q3LEnSrHUJhO+3ny8A\n3ltVlwEP6a8kSdIQugTCHUl+G3gZcHmSh3Z8nyTpANLlD/tZwBXAc6vq28AjgTf1WpUkaeZWDYSq\nuhf4OnBam3U/cGufRUmSZq/LVUbnAW8Gzm2zDgN+v8+iJEmz1+WQ0c8ALwIWAKrqb4Aj+yxKkjR7\nXQLhe1VVQAEkOaLfkiRJQ+gSCB9pVxkdneTngKuA9/VbliRp1jL553+VhZLnAD8FBLiiqj7Rd2Ft\nu9V2TAayg7m5E1lY2DFgDRqjJAzbNncLXb7Dmo2RtYtM/a4xNyYDQWM1si/+0EWoGVm7mDoQDl12\ndck1VXVakp08+DcMUFX1iH2oUpI0UssGQlWd1n56RZEkrQNd7kN45u7eTtv0kUme0W9ZkqRZ63KV\n0W8B9yyaXsDeTiXpoNMlEFKLzlpV1S5WONQkSTowdQmELyV5Q5LD2nA28KW+C5MkzVaXQPgF4CeA\nO4CvAc8A/k2XlSd5aJJrk9yY5JbWLxJJNia5Msn2JFckOWpffwFJ0tro/T6EJHNVdW+SQ4BPAW8A\n/gXwzar61SRvBjZW1Tl7ea/3IWiURna9+dBFqBlZu5j6PoQuVxk9McmfJflsm35qkrd23UDrPhvg\noUzOPRSwBdja5m8FXjxV1ZKkNdflkNH7mHR9fR9AVd0MvLzrBpJsSHIjcBfwiaq6HthUVfNtfXcB\nx0xbuCRpbXUJhLmqum7JvPu7bqCqdlXV04DjgFOSPJkf3Kcawz6WJK1rXS4f/dskj2dP99cvBe6c\ndkNV9XdJtgHPA+aTbKqq+STHMnki2zLOXzS+uQ2SpD22tWH/rHpSOcnjgPcyudLoW8CXgVdW1e2r\nrjz5B8B9VXV3ksOZPJv5HcDpwI6qutCTyjoQjezk4dBFqBlZu1i7zu12q6ovAWe0B+NsqKqdU6z/\n0cDWJBuYHJ76cFVdnuTTTJ6z8LPA7cBZ0xYuSVpbXfYQHgWcB5zGJPquAf5TVX2z9+LcQ9BIjew/\nwaGLUDOydrH2l50CHwK+weTegZe28Q9PuyFJ0rh12UP4bFX94yXzbqmqp/RaGe4haLxG9p/g0EWo\nGVm76GUP4cokL2/3E2xIchaTk8OSpINIlz2EncARwPeZPC1tA5MusKHnJ6e5h6CxGtl/gkMXoWZk\n7aKXq4x8YpokrQNd+jI6tV1ySpJXJfm1JMf3X5okaZa6PjHt3iT/BPj3wF8DH+i1KknSzHUJhPvb\nE9O2AO+pqv8OeBhJkg4yXfoy2pnkXOBVwLPaXceH9VuWJGnWuuwhvAz4LvD61lX1ccA7e61KkjRz\nvT8xbX942anGamSXFw5dhJqRtYtebkyTJK0DBoIkCVghEJL8Wft54ezKkSQNZaWrjB6d5CeAFyX5\nEJNuKx5QVTf0WpkkaaZWCoRfBt7G5KqiX1vyWgE/2VdRkqTZ69K53duq6u0zqmfptr3KSKM0sqtJ\nhi5CzcjaRS+d2709yYuAZ7VZ26rqT6fdkCRp3Lp0bncBcDbw+TacneRX+i5MkjRbXQ4Z3QycXFW7\n2vQhwI1V9dTei/OQkUZqZIcGhi5CzcjaRW83ph29aPyoaTciSRq/Lp3bXQDcmOTPmVx6+izgnF6r\nkiTNXKe+jJI8GvixNnld6+Sudx4y0liN7NDA0EWoGVm7WPurjACq6k7g0qlrkiQdMOzLSJIEGAiS\npGbFQEhySJIvzqoYSdJwVgyEqvo+sD3J8TOqR5I0kC4nlTcCn0tyHbCwe2ZVvai3qiRJM9clEN7W\nexWSpMF16dzuk0lOAJ5QVVclmQMO6b80SdIsdenc7ueAjwK/3Wb9EHBxn0VJkmavy2WnvwScCvwd\nQFXdChzTZ1GSpNnrEgjfrarv7Z5IcijjuDdbkrSGugTCJ5O8BTg8yXOAPwI+1m9ZkqRZ6xII5wDf\nAG4Bfh64HHhrn0VJkmavy1VGu5JsBa5lcqhoe3XsXjHJccD7gU3ALuB9VfXuJBuBDwMnAF8Bzqqq\nu/ftV5AkrYUuVxm9APhr4N3Ae4Dbkjy/4/rvB95YVU8Gfhz4pSRPYrLXcVVVnQRcDZy7L8VLktZO\nlxvT3gX8s6q6DSDJ44HLgI+v9sb23IS72vg9Sb4AHAdsAU5vi20FtuFDdyRpUF3OIezcHQbNl4Cd\n024oyWOBk4FPA5uqah4eCA0vY5WkgS27h5DkJW30r5JcDnyEyTmEM4Hrp9lIkoczubnt7LansPQc\nhJexStLAVjpk9MJF4/PsOcTzDeDwrhto9y18FPhAVV2ye31JNlXVfJJjga8vv4bzF41vboMkaY9t\nbdg/nZ6pvF8bSN4P/G1VvXHRvAuBHVV1YZI3Axur6gfOIfhMZY3VyJ6dO3QRakbWLqZ+pvKqgZDk\nh4F/BzyWRXsUXbq/TnIq8BdM7mGoNrwFuI7JIajHALczuez023t5v4GgURrZF3/oItSMrF1MHQhd\nrjK6GLiIyd3Ju6ZZeVV
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c1a9668>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot histogram, the default is 10 bins\n",
"plt.hist(iris.target, bins=10)\n",
"plt.xlabel('iris class')\n",
"plt.ylabel('Number of species')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see we have the same distribution of samples for each class.\n",
"Now we are going to see the distribution of the features"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
]
}
],
"source": [
"# We remember the name of the features to see its index\n",
"print(iris.feature_names)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
"# We remember the name of target names\n",
"print(iris.target_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A [**scatter plot**](https://en.wikipedia.org/wiki/Scatter_plot) (*gráfico de dispersión*) display values for typically two variables for a set of data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f8c093b70>"
2016-03-15 12:55:14 +00:00
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2016-03-15 15:12:44 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEPCAYAAABsj5JaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHY1JREFUeJzt3X+cXXV95/HXG5KpIzQpaSO1JCSgRBMJBtxFIKLTVVJC\n08RdWalL164NK+uaR3jUffjQ7UKS1a7Vtku3iMFSrQmdCRhdHJxQ2/HXPGSyixMTEtAZFDeMUqhM\nWgSlpH2AfPaPe2ZyZ3LvOXe+M3PP3OT9fDzuI/ee74/zOd85uZ97vufecxQRmJmZTdYpZQdgZmat\nyQnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLEmpCUTSIklfk/QdSQ9J2lyjzpskPS3pQPa4sYxY\nzcxsvDklr/8F4H0RcVDS6cB+Sb0R8fCEet+IiPUlxGdmZnWUegQSET+KiIPZ82eBIeCsGlXV1MDM\nzKzQrDkHImkpsAr4Zo3iSyUdlHSvpBVNDczMzGoqewoLgGz66vPADdmRSLX9wNkR8ZyktUA3sKzZ\nMZqZ2Xgq+1pYkuYAe4AvRcSfNlD/UeB1EfFUjTJf2MvMbJIiIuk0wWyYwvoLYLBe8pB0ZtXzi6kk\nveOSx6iImFWPrVu3lh6DYzpxYpqtcTmm1o1pKkqdwpK0GrgWeEjSA0AAvwcsASIibgeulvQe4Hng\nKHBNWfGamdkxpSaQiNgLnFpQ5xPAJ5oTkZmZNWo2TGGd0Do6OsoO4TiOqTGzMSaYnXE5psbMxpim\novST6NNJUpxI22NmNtMkES18Et3MzFqQE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL\n4gRiZmZJnEDMzCyJE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL4gRiZmZJnEDMzCyJ\nE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL4gRiZmZJnEDMzCyJE4iZmSUpNYFIWiTp\na5K+I+khSZvr1LtF0iOSDkpa1ew4zczseHNKXv8LwPsi4qCk04H9knoj4uHRCpLWAq+IiPMkvR74\nJHBJSfGamVmm1COQiPhRRBzMnj8LDAFnTai2Abgjq/NNYL6kM5saaIs5cuQI+/bt48iRI2WHMi3y\ntievbO/evWzdupW9e/dOut+hoSF27tzJ0NBQzbZ55UVtb7vtNt74xjdy2223HVfW1dXFhg0b6Orq\nqtl2z549XHfddezZs6dmeT1F/ebFXDSOeYr2xby+i9qm7udF7VLHeCoxtayImBUPYCkwDJw+YXkP\ncFnV668AF9XpI052u3bdFe3tC2L+/IuivX1B7Np1V9khTUne9uSVXXHF2oD2gPMC2mPNmrUN97tp\n0w1Z22UB7bFp0+ZxbfPKi9qeccaZ4+JasGDhWNmiReeMK1u8eOm4tueff+G48pUrVzU0hkX95sVc\nNI55ivbFvL6L2qbu50XtUsd4KjGVLXvfTHvfTm04nQ/gdOBbwIYaZU4gDRoZGYn29gUBhwIi4FC0\nty+IkZGRskNLkrc9eWX9/f3Zm8CxMmiP/v7+wn4HBwdrth0cHIyIyC0vart9+/aa5du3b4/Ozs6a\nZZ2dnRER0dPTU7O8p6cndwyL+s2LuWgcU/92EZHbd1Hb1P28qF3qGE8lptlgKgmk7HMgSJoDfB74\ny4i4p0aVx4HFVa8XZctq2rZt29jzjo4OOjo6piXOVjA8PExb21KOHr0gW3IBc+cuYXh4mIULF5Ya\nW4q87QHqlvX29lLZTY6VwVn09vayevXq3H4HBwep7G7VbRcxMDDA8uXLGRgYqFteUb/tnXfeWTOu\nO++8kzPOOKNm2e7du7n22mvp7u6uWd7d3c26devqjuHu3btz+83bnsOHD+eOY56ifTHvb9TW1pbb\nNnU/L2qXOsaN9D2b9PX10dfXNz2dpWae6XpQOb9xc075VcC92fNLgPtz6k5LRm5VrfwpqBYfgfgI\nxEcgM49WncICVgM/Aw4CDwAHgCuB64F3V9W7Ffg+cIg601fhBBIRx+Zh5827sKXmYevJ2568sjVr\nRufXXxl550Bqtd20aXNUz4Mffw6kfnlR2wULFo6Lq/ocyOLFS8eVTTxXsXLlqnHljc7PF/WbF3PR\nOOYp2hfz+i5qm7qfF7VLHeOpxFS2lk0g0/1wAqkYGRmJgYGBlvj004i87ckr6+/vjy1bttT9xJzX\ndnBwMHbs2DF29DCZ8qK227dvj8svvzy2b99+XFlnZ2esX79+7Ahhop6enti4cWNDn4on029ezEXj\nmKdoX8zru6ht6n5e1C51jKcSU5mmkkBUaX9ikBQn0vaYmc00SUSEUtr6UiZmZpbECcTMzJI4gZiZ\nWRInEDMzS+IEYmZmSZxAzMwsiROImZklcQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS+IEYmZm\nSZxAzMwsiROImZklcQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS+IEYmZmSZxAzMwsiROImZkl\ncQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS1J6ApH0aUlPSnqwTvmbJD0t6UD2uLHZMZqZ2fHm\nlB0A8Bng48AdOXW+ERHrmxSPmZk1oPQjkIjoB35cUE3NiMXMzBpXegJp0KWSDkq6V9KKsoMxM7PZ\nMYVVZD9wdkQ8J2kt0A0sq1d527ZtY887Ojro6OiY6fjMzFpGX18ffX1909KXImJaOppSENISoCci\nLmig7qPA6yLiqRplMRu2x8ysVUgiIpJOE8yWKSxR5zyHpDOrnl9MJekdlzzMzKy5Sp/CkrQL6AB+\nUdIPga1AGxARcTtwtaT3AM8DR4FryorVzMyOmRVTWNPFU1hmZpNzIkxhmZlZi3ECMTOzJE4gZmaW\nxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkTiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkS\nJxAzM0viBGJmZkmcQMzMLElDCUTSDZLmqeLTkg5IWjPTwZmZ2ezV6BHI70TET4A1wBnAvwc+OmNR\nmZnZrNdoAhm93eFVwF9GxHeqlpmZ2Umo0QSyX1IvlQTyN5J+Hnhx5sIyM7PZThFRXEk6BVgFHI6I\npyX9InBWRDw40wFOhqRoZHvMzKxCEhGRNKPU6BFIACuAzdnr04CXpKzQzMxODI0mkO3ApcA7stc/\nBT4xIxGZmVlLmNNgvddHxEWSHgCIiB9LapvBuMzMbJZr9AjkeUmnUpnKQtJCfBLdzOyk1mgCuQX4\nAvAySf8D6Ac+MmNRmZnZrNfQt7AAJL0aeDOV3398NSKGZjKwFP4WlpnZ5EzlW1i5CUTSvIj4iaQF\ntcoj4qmUlc4UJxAzs8mZya/x7sr+3Q98q+ox+nrKsmtrPSmp7m9KJN0i6RFJByWtmo71mpnZ1OQm\nkIhYl/17TkScW/U4JyLOnaYYPgP8Wr1CSWuBV0TEecD1wCenab0nrCNHjrBv3z6OHDlyXNnQ0BA7\nd+5kaOj4Gci8do2Up8bU1dXFhg0b6Orqqtk2rzyvbO/evWzdupW9e/fW7DdvLPbs2cN1113Hnj17\narbNK8/rtyiuorapf9uiscjrdyb3C2txEVH4AP41ML/q9S8Ab22kbYP9LwEerFP2SeCaqtdDwJl1\n6sbJbteuu6K9fUHMn39RtLcviF277hor27TphoD2gGUB7bFp0+aG2jVSnhrTokXnZDGdF9Aeixcv\nHdc2rzyv7Ior1o4rW7Nm7bh+88bi/PMvHNd25cpV49rmlef1WxRXUdvUv23RWOT1O5P7hc0O2ftm\n2nt3Q5XgYI1lD6SutEZfeQmkB7is6vVXgIvq1J2uMW1JIyMj0d6+IOBQQAQcivb2BTEyMhKDg4PZ\nm8ixMmiPwcHB3HZF/U4lps7OzpoxdXZ2RkTklueV9ff31yzr7++PiMgdi56enpplPT09ERG55Xn9\nRkRuXEVtU/+2RWOR1+9
2016-03-15 12:55:14 +00:00
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c110c18>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# scatter makes a plot of x vs y\n",
"plt.scatter(iris.data[:,0], iris.target)\n",
2016-03-15 15:12:44 +00:00
"plt.xlabel(iris.feature_names[0])\n",
"plt.ylabel('species')"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 8,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEACAYAAABWLgY0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X14VOWZP/DvkxcIBMIEIi8JkBmCWnG7Vn7XirwIARaq\ndksrogUmEnxrUQsIWC+rO5I0trVoUfFllcWaKBRQrC77U9yySDKAK/62aNW6ajeZRJsgihACkRBI\n7t8fGZLJ5IRzZubMmTMz3891zcXknDPPuc9D8uTkOec+txIREBFRYkqJdQBERBQ9HOSJiBIYB3ki\nogTGQZ6IKIFxkCciSmAc5ImIEpjhQV4plaKUOqCU2q6xbppSqtG//oBS6p/NDZOIiMKRFsK2ywF8\nBCCrl/VeEZkTeUhERGQWQ2fySqmRAK4GsOFcm5kSERERmcbodM0jAH4G4FzpsROVUu8ppV5TSo2L\nPDQiIoqU7iCvlPoegEMi8h46zta1ztj/BGC0iHwHwBMAXjU1SiIiCovSe3aNUupXAIoAnAHQD8BA\nAH8QkUXn+IwPwP8RkSNBy/mgHCKiMIhIWFPiumfyInKviIwWkTEA5gN4M3iAV0oNC3h/GTp+eRyB\nBhGx/Wv16tUxj4FxMs54jZFxmv+KRCh313SjlPpJx5gt6wHMU0rdBuA0gJMAfhRRVEREZIqQBnkR\nqQJQ5X//TMDyJwE8aW5oREQUKWa8aigsLIx1CIYwTnPFQ5zxECPAOO1E98KrqTtTSqzcHxFRIlBK\nQcK88Br2nDxRtPl8dfB4ylFf3468vBSUlS2Gy5Uf67ASgtPpRF1dXazDoCD5+fmora01tU2eyZMt\n+Xx1mDXrcVRXlwLIBNCMgoLV2LlzKQd6E/jPDGMdBgXp7f8lkjN5zsmTLXk85QEDPABkorq6FB5P\neQyjIoo/HOTJlurr29E1wJ+ViYaG9liEQxS3OMiTLeXlpQBoDlrajNxcfssShYI/MWRLZWWLUVCw\nGl0DfcecfFnZ4pjFRBSPeOGVbOvs3TUNDe3IzeXdNWZKtAuvdXV1cLlcOHPmDFJS4vfcNRoXXjnI\nEyWhSAZ5O97aWltbi4KCArS2tiI1NTWmsUSCd9cQUUydvbV106a7UFlZik2b7sKsWY/D5zP3nvvf\n/OY3GDlyJLKysnDRRRdh9+7dEBE8+OCDGDt2LM477zzMnz8fjY2NAIBp06YBABwOB7KysrB//36I\nCB544AE4nU4MHz4cixcvRlNTEwDg1KlTuOGGG5CTk4Ps7GxMmDABX331FQCgvLwc48aNQ1ZWFsaO\nHYv169ebemyWs/hJakJEsRfuz6LbXSLACQEk4HVC3O4S02L75JNPZNSoUfLFF1+IiEhdXZ3U1NTI\no48+KhMnTpSGhgZpbW2VJUuWyIIFC0REpLa2VlJSUqS9vb2znWeffVbOP/98qa2tlebmZpk7d64s\nWrRIRESeeeYZmTNnjrS0tEh7e7scOHBAjh8/LiIir7/+uvh8PhER8Xq90r9/f3n33XdNO75z6e3/\nxb88rHGXZ/JEZJgVt7ampqaitbUVH374Ic6cOYPRo0fD5XLhmWeewS9/+UuMGDEC6enpuP/++7Ft\n2za0t7d3TnGc/RcAfv/732PlypXIz89H//798etf/xpbtmxBe3s70tPT8fXXX+PTTz+FUgqXXnop\nBgwYAAC46qqr4HQ6AQBXXHEFZs+ejT179ph2fFbjIE9Ehllxa2tBQQEeffRRlJSUYOjQoVi4cCEO\nHjyIuro6XHPNNRg8eDAGDx6McePGIT09HYcOHYJSPaerGxoakJ/fda0gPz8fp0+fxqFDh3DDDTfg\nu9/9LubPn4+RI0finnvuQVtbGwBgx44dmDhxIoYMGYLs7Gzs2LEDhw8fNu34LBfunwDhvMDpGiJb\nCPdnsaamVgoKVgVM2ZyQgoJVUlNTa3KEHY4fPy4LFiyQG264Qb71rW/JW2+9pbldXV2dpKSkSFtb\nW+eymTNnyr/8y790fv3JJ59Inz59um1z9rPjxo2T3/3ud3Lq1Cnp37+//OEPf+jc7oc//KF4PJ4o\nHF1Pvf2/gNM1RGQFlysfO3cuhdv9MKZPXw23+2HTnyf06aefYvfu3WhtbUWfPn3Qr18/pKamYsmS\nJbj33nvx2WefAQC++uorbN++HQBw3nnnISUlBdXV1Z3tLFiwAI888ghqa2tx4sQJ3HfffZg/fz5S\nUlJQWVmJDz/8EO3t7RgwYADS09M7p4laW1uRk5ODlJQU7NixA3/84x9NO7ZY4FMoiSgkLlc+Nm5c\nHbX2T506hXvuuQcff/wx0tPTMWnSJKxfvx7Dhg2DiGD27Nk4ePAghg4dih/96EeYM2cO+vXrh/vu\nuw+TJ0/GmTNn8MYbb+Cmm27CwYMHMXXqVJw6dQpXXnkl1q1bBwD44osvsGTJEtTX12PAgAGYP38+\nioqKkJKSgnXr1uG6665Da2srvv/97+MHP/hB1I7VCrxPnigJJVoyVKLgffJERBQSTtdQ1NgxM5Io\n2XC6hqKCRT/sjdM19sTpGoobLPpBZA8c5CkqWPSDyB44yFNUsOgHkT3wJ46igkU/iOyBF14palj0\nw7544dWeWDSEiEzBQR4YOHAgPvjgg84nTobD5XLh2WefxYwZM0yJKRqDPO+TJ6KkdPz48ViHYAkO\n8kmKiUoULl+tD561HtQ31SMvKw9lK8vgcrpiHVYPbW1tti0FaGVsvPCahKwq4UaJx1frw6yfzsKm\ngZtQ6arEpoGbMOuns+Cr9Zm2jzVr1uC6667rtmz58uW488470dTUhJtvvhm5ubkYNWoUPB5P5/RG\nRUUFpkyZgpUrVyInJwelpaWorq5GYWEhHA4Hhg4digULFnS2mZKSgpqaGgBAS0sLVq1aBafTiezs\n7M6HmgHA9u3b8Xd/93cYPHgwZsyYgY8//lgz7tbWVtx5553Iy8vDyJEjsWLFCpw+fRoAUFVVhVGj\nRmHNmjUYMWIEbrrpJtP6S1e4zygO5wU+T94WrCjhRvYW7s+ie6lbcC8EJQGveyHupW7TYqurq5PM\nzEw5ceKEiIi0tbXJiBEjZP/+/XLNNdfIbbfdJidPnpSvvvpKJkyYIOvXrxcRkfLycklLS5Mnn3xS\n2tra5OTJk7JgwQL51a9+JSIip06dkn379nXuJyUlRaqrq0VE5Pbbb5fp06fLwYMHpb29Xf7rv/5L\nWltb5ZNPPpHMzEzZtWuXnDlzRtasWSNjx46V06dPi4iI0+mUXbt2iYiIx+ORiRMnyuHDh+Xw4cMy\nadIkuf/++0VEpLKyUtLS0uTnP/+5tLa2SktLi+ax9/b/Aj5PnkLBRCUKV31TPdAnaGEfoKGpwbR9\njB49GuPHj8crr7wCANi1axcyMzPhdDrx+uuv45FHHkFGRgZycnJw5513YvPmzZ2fzcvLw+23346U\nlBRkZGQgPT0ddXV1qK+vR58+fTBp0qTObSWgZOBzzz2HdevWYfjw4VBK4fLLL0d6ejpefPFF/NM/\n/RNmzJiB1NRU3HXXXTh58iTeeuutHnH//ve/x+rVqzFkyBAMGTIEq1evxgsvvNC5PjU1FaWlpUhP\nT0ffvn1N6y89HOSTEBOVKFx5WXlAa9DCViA3K9fU/SxYsKBz8N68eTMWLlyIuro6nD59GiNGjMDg\nwYORnZ2NJUuWdCvNN2rUqG7tPPTQQ2hvb8dll12Gb3/723juued67Ovw4cM4deoUxowZ02NdcAlB\npRRGjRqF+vp6zW1Hjx7d+XV+fj4aGrp++Z133nlIT08PoRfMwZ/qJMREJQpX2coyFPy5oGugbwUK\n/lyAspVlpu7nuuuuQ2VlJerr6/HKK6/A7XZj1KhRyMjIwNdff40jR47g6NGjaGxsxPvvv9/5ueBa\nr0OHDsX69etRX1+Pp59+GrfffnvnPPxZOTk5yMjI6FZV6qzc3FzU1XW/VvX5559j5MiRutvW1dUh\nN7frl59WHVorcJBPQla
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c0a8748>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot the distribution of the dataset\n",
"names = set(iris.target)\n",
"\n",
"# x and y are all the samples from column 0 (sepal_length) and 1 (sepal_width) respectively\n",
"x,y = iris.data[:,0], iris.data[:,1]\n",
"\n",
"for name in names:\n",
" cond = iris.target == name\n",
" plt.plot(x[cond], y[cond], linestyle='none', marker='o', label=iris.target_names[name])\n",
"\n",
"plt.legend(numpoints=1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, the Setosa class seems to be linear separable with these two features.\n",
"\n",
"Another nice visualisation is given below."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f5e1f1320>"
2016-03-15 12:55:14 +00:00
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEPCAYAAAC+35gCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXd4VFX6+D9nZjItjZIQCL33DqEKAVFEVlQsYENxdwEV\nu/5s64plZfW7dhQsqOi6InZBpBN67whCKAlNegnJTNrM+/vjDjDJJDAhk4SQ83me++TOue89970l\n973nPe95jxIRNBqNRqMJBlNZK6DRaDSa8oM2GhqNRqMJGm00NBqNRhM02mhoNBqNJmi00dBoNBpN\n0GijodFoNJqgKXGjoZRKUUptUEqtU0qtLETmXaVUslJqvVKqXUnrpNFoNKWNUqqGUmrKRez3kVKq\n2QVkRiql7rx47YqgT0mP01BK7QI6isiJQrYPAEaLyEClVBfgHRHpWqJKaTQazSWCUsosIp6y1iNY\nSsM9pS5wnOuBLwBEZAUQrZSKKwW9NBqNpkRQSo1VSt3v9/sFpdTjSqlNvt93K6V+VkrNBeYogw+U\nUluUUjOVUr8qpQb7ZOcrpTr41k8rpV7xeWWWKqVi/ep/zLfeUCk12yezWilVXykVrpSa4/u9QSk1\n6GLPrTSMhgCzlVKrlFJ/L2B7TWCv3+/9vjKNRqMpr3wD3Or3+1ZgOcb78AztgcEi0gcYDNQRkRbA\nMKBbIfWGA0tFpB2wCCjonfoV8J5PpjvwJ+AGbhCRTkBf4I2LPTHLxe5YBHqIyJ8+izhbKbVVRBaX\nwnE1Go2mTBCR9UqpWKVUdaAacBzYl09stoic8q33BL717XtIKTW/kKqzRGS6b30N0M9/o1IqAogX\nkV98dWX7yi3AWKVUL8ALxCulqonI4aKeW4kbDRH50/f3iFLqRyAB8Dca+4Hafr9r+cryoJTSSbI0\nGk3QiIgqzv6VlDr7Rg+CQyJSPV/Zt8AtQHWMlkd+Mi5CrRy/dQ8Fv8MLOu87gBigvYh4lVK7AftF\nHL9k3VNKKafP8qGUCgeuBjbnE/sFozmGUqorcFJEDhVUn4iU+vLCCy9UmONWpHPVx718jykSmu/L\nU8ArQS5AQf2wU4ChwE34WhHnYQlwk69vIw5ILETuvIZQRNKBvUqp6wGUUlallAOIBg6LYTD6AHUv\noE+hlHRLIw740ddKsABficgspdRIQETkIxGZrpS6Vim1A8PyDi9hnTQajSYowoqxr4hsUUpFAvvE\ncDmd70X9PUZfw+8YfbxrMOwW5O0HCcYiDgM+VEq9BGRjtHa+AqYqpTYAq4GtRToZP0rUaIjIbiBg\n3IWIfJjv9+iS1EOj0WguhuK+IEWkjd96KtDGtz4JmOS3TZRST4pIhlKqCrAC2OTb1tdPLspv/XsM\nY4OIvOhXvgO4sgB1uhfzdIDS6Qgv1yQmJlaY41akc9XHvXyPGUocpXu4aUqpShgNnJfkIjqpS4MS\nH9wXKpRSUl501Wg0ZYtSCilmR7hSSj4OUvbvFL/jvbygWxoajUZTCPoFGYi+JhqNRlMIxekIv1zR\nRkOj0WgKQb8gA9HXRKPRaApBtzQC0UZDo9FoCkEbjUC00dBoNJpCKOWQ23KBNhoajUZTCPoFGYi+\nJhqNRlMI2j0ViDYaGo1GUwj6BRmIviYajUZTCLqlEYg2GhqNRlMI+gUZiL4mGo1GUwi6pRGINhoa\njUZTCDrkNhBtNDQajaYQdEsjEG00NBqNphD0CzIQfU00Go2mEMKCfUPmlqgalxTaaGg0Gk0hWLTR\nCEAbDY1GoymEMHNZa3DpoY2GRqPRFELQLY0KhL4kGo1GUwhhtrLW4NJDGw2NRqMpDP2GDMBUGgdR\nSpmUUmuVUr8UsK23Uuqkb/tapdQ/SkMnjUajuSCWIJcKRGmd7sPAFiCqkO0LRWRQKemi0Wg0wVHB\nDEIwlHhLQylVC7gW+OR8YiWth0aj0RQZc5BLBaI03FNvAU8Cch6Zbkqp9UqpX5VSLUpBJ41Go7kw\n2j0VQImerlJqIHBIRNYrpRIpuEWxBqgjIi6l1ADgJ6BJQfWNGTPm7HpiYiKJiYmhVlmj0ZRDkpKS\nSEpKCn3FOnoqACVyvgZAMStX6lXgTozxkg4gEvhBRIadZ5/dQEcROZ6vXEpSV41Gc/mglEJEiuX2\nVkqJdApSdjXFPl55oUTdUyLyrIjUEZEGwFBgXn6DoZSK81tPwDBkx9FoNJqyRrunAiiVkNv8KKVG\nKqVG+H7erJTarJRaB7wNDCkLnTTBM3/+fK65ZhDNmrXl7rv/yrZt28paJY2mZNAd4QGUqHsqlGj3\n1KXBhAkf8vjj/8Dl6gZUw2zejd2+lvnzZ9G5c+eyVk+jAULonuodpOyCiuOe0kZDEzQZGRlUqxaP\ny3UXEOu3ZR2dOx9j5cpFZaWaRpOHkBmNK4OUnVtxjEaZuKc05ZOVK1disVQjr8EAaM2aNcvJysoq\nC7U0mpJD92kEUMFOV1McwsPD8XrdGENu/D+qsjCbLVh0SlDN5YYOuQ1AtzQ0QdOpUyeiosIwMsKc\nQQgLW8KNNw7GbK5gPYKayx/d0giggp2upjiYTCZ+/vlbrrzyGjye7WRkVCIycj9xcVbGjXu7rNXT\naEKP/g4KQHeEa4pMWloa33zzDXv27KF9+/YMGjRIu6Y0lxQh6wi/LUjZrytOR7g2GhqN5rIjZEbj\nriBlv6w4RkN/Hmo0Gk1haPdUANpoaDQaTWHoN2QA+pJoNBpNYdjLWoFLD200NACICFlZWdhsNpSq\nEK5ZjebCaPdUAHqcRgVHRPjPf94kNjaeiIgo4uJq895749BBBxoNepxGAVSw09Xk54UXXuKNNz7D\n5boBqM6RI/t5+unXcLncPPXUk2WtnkZTtug3ZAA65LYC43K5iI2tgct1L1DJb8sRIiO/5ujRP7Fa\nrWWlnkZz0YQs5PbZIGVfrTght9o9VYHZvXs3ZnMkeQ0GQCxer4kDBw6UhVoazaWDdk8FUMFOV+NP\n9erVyck5BWSSN0wkA48nk5iYmDLSTKO5RNBvyAB0S6MCU7VqVQYO/As222wgx1eajd0+myFDhhAR\nEVGW6mk0ZY8tyKUCofs0yjHHjh3j448/YdGi5TRqVJ/77x9J06ZNi1TH6dOnufXWO0hKWojNVoOs\nrAP079+f//3vc5xOZwlprtGULCHr03gjSNnHK06fhjYa5ZSdO3fSpUsPXK5auN11sViOYbWu5+uv\nv2DQoEFFrm/37t3s3LmTJk2aUKdOnRLQWKMpPUJmNN4JUvZhbTQuObTRyEv//tcxZ04WXm8Pv9K9\nREf/zOHD+3XUk6ZCEzKjMS5I2dEVx2joPo1ySG5uLnPnzsTr7ZRvS21EIlm+fHmZ6KXRXHbo6KkA\nKtjpajQaTRHQb8gAdEujHGKxWOjX7xpMplX5tuzBZEqnW7duZaKXRnPZYQ5yqUCUih1VSpmA1cA+\nEQnopVVKvQsMADKAe0RkfWnoVZ754IO36dKlBxkZx3G76xAWdoywsI18+eV/CQsLKzO9Nm3axIED\nB2jXrh1xcXFlpodGExJ0ltsASqvx9TCwBYjKv0EpNQBoKCKNlVJdgAlA11LSq9zSoEEDtm37nYkT\nP2Xx4hU0atSF++77lEaNGpWJPvv27WPgwBvZsSOFsLBYMjP3cs89d/P+++9gNlewTzHN5YN2TwVQ\n4tFTSqlawGfAv4DH8rc0lFITgPki8o3v91YgUUQO5ZPT0VOXKCJC69Yd+OOPKng8PTG8nm6czu95\n/vm/8/TT/6+sVdRUMEIWPfVDkLKDdfRUKHkLeBIo7I1fE9jr93u/r0xTTlizZg0pKQf9DAaAA5fr\nSt58892yVE2jKR46eiqAEj1dpdRA4JCIrFdKJQLFssRjxow5u56YmEhiYmJxqtOEiH379mE2VyPw\nGySWY8cOloVKmgpGUlISSUlJoa+4ghmEYChR95RS6lXgTiAXcACRwA8iMsxPJr976g+gt3ZPlR9S\nUlJo3rwtmZmjAf9Bhdt
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f5e2412b0>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x_index = 0\n",
"y_index = 1\n",
"formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])\n",
"plt.scatter(iris.data[:, x_index], iris.data[:, y_index], s=40,\n",
"c=iris.target)\n",
"plt.colorbar(ticks=[0, 1, 2], format=formatter)\n",
"plt.xlabel(iris.feature_names[x_index])\n",
"plt.ylabel(iris.feature_names[y_index])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we can also check that the Setosa class seems to be linear separable.\n",
"\n",
"Students interested in practicing advanced visualisations can check [Advanced visualisation notebook](2_3_1_Advanced_Visualisation.ipynb).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](http://proquest.safaribooksonline.com/book/programming/python/9781783981960), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
"* [Iris dataset visualisation notebook](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook)\n",
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}