1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-22 14:32:28 +00:00
sitc/ml1/2_3_0_Visualisation.ipynb

390 lines
80 KiB
Plaintext
Raw Normal View History

2016-03-15 12:55:14 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Visualisation](#Visualisation)\n",
"* [Exploratory visualisation](#Exploratory-visualisation)\n",
"* [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we are going to inspect the distribution of the samples per feature."
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 2,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# library for displaying plots\n",
"import matplotlib.pyplot as plt\n",
"# display plots in the notebook\n",
"# if this is not set, you will not see the graphic here\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we are going to analyse the [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
"\n",
"A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). \n",
"\n",
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"\n",
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 4,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f8c1589b0>"
2016-03-15 12:55:14 +00:00
]
},
2016-03-15 15:12:44 +00:00
"execution_count": 4,
2016-03-15 12:55:14 +00:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEPCAYAAABCyrPIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE8NJREFUeJzt3X+UZ3V93/HnawGVQYTVlsUTBKMoNlaLNkETOLKNGDVW\n11hBrVo1Nk1y0so59ngEq4HWNkis+WHtyYmGNKuJUeNJAAMWJGRNMRWIQMFfW4iKlcBoXCXLYBTc\nd//4fpYdxp2Z+92d+713d56Pc+6Ze+/3fu99z/d8vvOa++tzU1VIkrRh6AIkSeNgIEiSAANBktQY\nCJIkwECQJDUGgiQJgEP73kCSrwB3A7uA+6rqlCQbgQ8DJwBfAc6qqrv7rkWStLxZ7CHsAjZX1dOq\n6pQ27xzgqqo6CbgaOHcGdUiSVjCLQMhetrMF2NrGtwIvnkEdkqQVzCIQCvhEkuuT/Os2b1NVzQNU\n1V3AMTOoQ5K0gt7PIQCnVtWdSf4hcGWS7UxCYjH7z5CkgfUeCFV1Z/v5jSQXA6cA80k2VdV8kmOB\nr+/tvUkMCknaB1WVad/T6yGjJHNJHt7GjwB+CrgFuBR4bVvsNcAly6+lBhy+ydzcRqrqoBjOO++8\nwWs4WIbh2+bugcE/i7UYDpa2OaZ2sS/63kPYBPxJ+0//UOAPqurKJH8FfCTJzwK3A2f1XIckaRW9\nBkJVfRk4eS/zdwBn9LltSdJ0vFN5Hdm8efPQJUh7Zdsch+w57jU+k0NNQ9a3g7m5E1lY2DFgDRqj\nJIzj4rgw5u/wejOydjGuk8qSpAOHgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2B\nIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBA\nkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCZhQISTYk\nuSHJpW16Y5Irk2xPckWSo2ZRhyRpebPaQzgb+Pyi6XOAq6rqJOBq4NwZ1SFJWkbvgZDkOOCngd9Z\nNHsLsLWNbwVe3HcdkqSVzWIP4deBNwG1aN6mqpoHqKq7gGNmUIckaQWH9rnyJC8A5qvqpiSbV1i0\nln/p/EXjm9sgSdpjWxv2T6pW+Fu8vytPfgV4FXA/cDhwJPAnwI8Cm6tqPsmxwJ9X1T/ay/trxazo\n3Q7m5k5kYWHHgDVojJIwbNvcLfT5HdZ0RtYuMu27ej1kVFVvqarjq+pxwMuBq6vq1cDHgNe2xV4D\nXNJnHZKk1Q11H8I7gOck2Q48u01LkgbU6yGj/eUhI43VyA4NDF2EmpG1i3EdMpIkHTgMBEkSYCBI\nkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSGgNBkgQYCJKkxkCQJAEGgiSpMRAk\nSYCBIElqVg2EJGcmObKNvzXJHyd5ev+lSZJmqcsewtuqameS04AzgIuA3+q3LEnSrHUJhO+3ny8A\n3ltVlwEP6a8kSdIQugTCHUl+G3gZcHmSh3Z8nyTpANLlD/tZwBXAc6vq28AjgTf1WpUkaeZWDYSq\nuhf4OnBam3U/cGufRUmSZq/LVUbnAW8Gzm2zDgN+v8+iJEmz1+WQ0c8ALwIWAKrqb4Aj+yxKkjR7\nXQLhe1VVQAEkOaLfkiRJQ+gSCB9pVxkdneTngKuA9/VbliRp1jL553+VhZLnAD8FBLiiqj7Rd2Ft\nu9V2TAayg7m5E1lY2DFgDRqjJAzbNncLXb7Dmo2RtYtM/a4xNyYDQWM1si/+0EWoGVm7mDoQDl12\ndck1VXVakp08+DcMUFX1iH2oUpI0UssGQlWd1n56RZEkrQNd7kN45u7eTtv0kUme0W9ZkqRZ63KV\n0W8B9yyaXsDeTiXpoNMlEFKLzlpV1S5WONQkSTowdQmELyV5Q5LD2nA28KW+C5MkzVaXQPgF4CeA\nO4CvAc8A/k2XlSd5aJJrk9yY5JbWLxJJNia5Msn2JFckOWpffwFJ0tro/T6EJHNVdW+SQ4BPAW8A\n/gXwzar61SRvBjZW1Tl7ea/3IWiURna9+dBFqBlZu5j6PoQuVxk9McmfJflsm35qkrd23UDrPhvg\noUzOPRSwBdja5m8FXjxV1ZKkNdflkNH7mHR9fR9AVd0MvLzrBpJsSHIjcBfwiaq6HthUVfNtfXcB\nx0xbuCRpbXUJhLmqum7JvPu7bqCqdlXV04DjgFOSPJkf3Kcawz6WJK1rXS4f/dskj2dP99cvBe6c\ndkNV9XdJtgHPA+aTbKqq+STHMnki2zLOXzS+uQ2SpD22tWH/rHpSOcnjgPcyudLoW8CXgVdW1e2r\nrjz5B8B9VXV3ksOZPJv5HcDpwI6qutCTyjoQjezk4dBFqBlZu1i7zu12q6ovAWe0B+NsqKqdU6z/\n0cDWJBuYHJ76cFVdnuTTTJ6z8LPA7cBZ0xYuSVpbXfYQHgWcB5zGJPquAf5TVX2z9+LcQ9BIjew/\nwaGLUDOydrH2l50CHwK+weTegZe28Q9PuyFJ0rh12UP4bFX94yXzbqmqp/RaGe4haLxG9p/g0EWo\nGVm76GUP4cokL2/3E2xIchaTk8OSpINIlz2EncARwPeZPC1tA5MusKHnJ6e5h6CxGtl/gkMXoWZk\n7aKXq4x8YpokrQNd+jI6tV1ySpJXJfm1JMf3X5okaZa6PjHt3iT/BPj3wF8DH+i1KknSzHUJhPvb\nE9O2AO+pqv8OeBhJkg4yXfoy2pnkXOBVwLPaXceH9VuWJGnWuuwhvAz4LvD61lX1ccA7e61KkjRz\nvT8xbX942anGamSXFw5dhJqRtYtebkyTJK0DBoIkCVghEJL8Wft54ezKkSQNZaWrjB6d5CeAFyX5\nEJNuKx5QVTf0WpkkaaZWCoRfBt7G5KqiX1vyWgE/2VdRkqTZ69K53duq6u0zqmfptr3KSKM0sqtJ\nhi5CzcjaRS+d2709yYuAZ7VZ26rqT6fdkCRp3Lp0bncBcDbw+TacneRX+i5MkjRbXQ4Z3QycXFW7\n2vQhwI1V9dTei/OQkUZqZIcGhi5CzcjaRW83ph29aPyoaTciSRq/Lp3bXQDcmOTPmVx6+izgnF6r\nkiTNXKe+jJI8GvixNnld6+Sudx4y0liN7NDA0EWoGVm7WPurjACq6k7g0qlrkiQdMOzLSJIEGAiS\npGbFQEhySJIvzqoYSdJwVgyEqvo+sD3J8TOqR5I0kC4nlTcCn0tyHbCwe2ZVvai3qiRJM9clEN7W\nexWSpMF16dzuk0lOAJ5QVVclmQMO6b80SdIsdenc7ueAjwK/3Wb9EHBxn0VJkmavy2WnvwScCvwd\nQFXdChzTZ1GSpNnrEgjfrarv7Z5IcijjuDdbkrSGugTCJ5O8BTg8yXOAPwI+1m9ZkqRZ6xII5wDf\nAG4Bfh64HHhrn0VJkmavy1VGu5JsBa5lcqhoe3XsXjHJccD7gU3ALuB9VfXuJBuBDwMnAF8Bzqqq\nu/ftV5AkrYUuVxm9APhr4N3Ae4Dbkjy/4/rvB95YVU8Gfhz4pSRPYrLXcVVVnQRcDZy7L8VLktZO\nlxvT3gX8s6q6DSDJ44HLgI+v9sb23IS72vg9Sb4AHAdsAU5vi20FtuFDdyRpUF3OIezcHQbNl4Cd\n024oyWOBk4FPA5uqah4eCA0vY5WkgS27h5DkJW30r5JcDnyEyTmEM4Hrp9lIkoczubnt7LansPQc\nhJexStLAVjpk9MJF4/PsOcTzDeDwrhto9y18FPhAVV2ye31JNlXVfJJjga8vv4bzF41vboMkaY9t\nbdg/nZ6pvF8bSN4P/G1VvXHRvAuBHVV1YZI3Axur6gfOIfhMZY3VyJ6dO3QRakbWLqZ+pvKqgZDk\nh4F/BzyWRXsUXbq/TnIq8BdM7mGoNrwFuI7JIajHALczuez023t5v4GgURrZF3/oItSMrF1MHQhd\nrjK6GLiIyd3Ju6ZZeVV
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c1a9668>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot histogram, the default is 10 bins\n",
"plt.hist(iris.target, bins=10)\n",
"plt.xlabel('iris class')\n",
"plt.ylabel('Number of species')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see we have the same distribution of samples for each class.\n",
"Now we are going to see the distribution of the features"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
]
}
],
"source": [
"# We remember the name of the features to see its index\n",
"print(iris.feature_names)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
"# We remember the name of target names\n",
"print(iris.target_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A [**scatter plot**](https://en.wikipedia.org/wiki/Scatter_plot) (*gráfico de dispersión*) display values for typically two variables for a set of data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f8c093b70>"
2016-03-15 12:55:14 +00:00
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2016-03-15 15:12:44 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEPCAYAAABsj5JaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHY1JREFUeJzt3X+cXXV95/HXG5KpIzQpaSO1JCSgRBMJBtxFIKLTVVJC\n08RdWalL164NK+uaR3jUffjQ7UKS1a7Vtku3iMFSrQmdCRhdHJxQ2/HXPGSyixMTEtAZFDeMUqhM\nWgSlpH2AfPaPe2ZyZ3LvOXe+M3PP3OT9fDzuI/ee74/zOd85uZ97vufecxQRmJmZTdYpZQdgZmat\nyQnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLEmpCUTSIklfk/QdSQ9J2lyjzpskPS3pQPa4sYxY\nzcxsvDklr/8F4H0RcVDS6cB+Sb0R8fCEet+IiPUlxGdmZnWUegQSET+KiIPZ82eBIeCsGlXV1MDM\nzKzQrDkHImkpsAr4Zo3iSyUdlHSvpBVNDczMzGoqewoLgGz66vPADdmRSLX9wNkR8ZyktUA3sKzZ\nMZqZ2Xgq+1pYkuYAe4AvRcSfNlD/UeB1EfFUjTJf2MvMbJIiIuk0wWyYwvoLYLBe8pB0ZtXzi6kk\nveOSx6iImFWPrVu3lh6DYzpxYpqtcTmm1o1pKkqdwpK0GrgWeEjSA0AAvwcsASIibgeulvQe4Hng\nKHBNWfGamdkxpSaQiNgLnFpQ5xPAJ5oTkZmZNWo2TGGd0Do6OsoO4TiOqTGzMSaYnXE5psbMxpim\novST6NNJUpxI22NmNtMkES18Et3MzFqQE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL\n4gRiZmZJnEDMzCyJE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL4gRiZmZJnEDMzCyJ\nE4iZmSVxAjEzsyROIGZmlsQJxMzMkjiBmJlZEicQMzNL4gRiZmZJnEDMzCyJE4iZmSUpNYFIWiTp\na5K+I+khSZvr1LtF0iOSDkpa1ew4zczseHNKXv8LwPsi4qCk04H9knoj4uHRCpLWAq+IiPMkvR74\nJHBJSfGamVmm1COQiPhRRBzMnj8LDAFnTai2Abgjq/NNYL6kM5saaIs5cuQI+/bt48iRI2WHMi3y\ntievbO/evWzdupW9e/dOut+hoSF27tzJ0NBQzbZ55UVtb7vtNt74xjdy2223HVfW1dXFhg0b6Orq\nqtl2z549XHfddezZs6dmeT1F/ebFXDSOeYr2xby+i9qm7udF7VLHeCoxtayImBUPYCkwDJw+YXkP\ncFnV668AF9XpI052u3bdFe3tC2L+/IuivX1B7Np1V9khTUne9uSVXXHF2oD2gPMC2mPNmrUN97tp\n0w1Z22UB7bFp0+ZxbfPKi9qeccaZ4+JasGDhWNmiReeMK1u8eOm4tueff+G48pUrVzU0hkX95sVc\nNI55ivbFvL6L2qbu50XtUsd4KjGVLXvfTHvfTm04nQ/gdOBbwIYaZU4gDRoZGYn29gUBhwIi4FC0\nty+IkZGRskNLkrc9eWX9/f3Zm8CxMmiP/v7+wn4HBwdrth0cHIyIyC0vart9+/aa5du3b4/Ozs6a\nZZ2dnRER0dPTU7O8p6cndwyL+s2LuWgcU/92EZHbd1Hb1P28qF3qGE8lptlgKgmk7HMgSJoDfB74\ny4i4p0aVx4HFVa8XZctq2rZt29jzjo4OOjo6piXOVjA8PExb21KOHr0gW3IBc+cuYXh4mIULF5Ya\nW4q87QHqlvX29lLZTY6VwVn09vayevXq3H4HBwep7G7VbRcxMDDA8uXLGRgYqFteUb/tnXfeWTOu\nO++8kzPOOKNm2e7du7n22mvp7u6uWd7d3c26devqjuHu3btz+83bnsOHD+eOY56ifTHvb9TW1pbb\nNnU/L2qXOsaN9D2b9PX10dfXNz2dpWae6XpQOb9xc075VcC92fNLgPtz6k5LRm5VrfwpqBYfgfgI\nxEcgM49WncICVgM/Aw4CDwAHgCuB64F3V9W7Ffg+cIg601fhBBIRx+Zh5827sKXmYevJ2568sjVr\nRufXXxl550Bqtd20aXNUz4Mffw6kfnlR2wULFo6Lq/ocyOLFS8eVTTxXsXLlqnHljc7PF/WbF3PR\nOOYp2hfz+i5qm7qfF7VLHeOpxFS2lk0g0/1wAqkYGRmJgYGBlvj004i87ckr6+/vjy1bttT9xJzX\ndnBwMHbs2DF29DCZ8qK227dvj8svvzy2b99+XFlnZ2esX79+7Ahhop6enti4cWNDn4on029ezEXj\nmKdoX8zru6ht6n5e1C51jKcSU5mmkkBUaX9ikBQn0vaYmc00SUSEUtr6UiZmZpbECcTMzJI4gZiZ\nWRInEDMzS+IEYmZmSZxAzMwsiROImZklcQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS+IEYmZm\nSZxAzMwsiROImZklcQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS+IEYmZmSZxAzMwsiROImZkl\ncQIxM7MkTiBmZpbECcTMzJI4gZiZWRInEDMzS1J6ApH0aUlPSnqwTvmbJD0t6UD2uLHZMZqZ2fHm\nlB0A8Bng48AdOXW+ERHrmxSPmZk1oPQjkIjoB35cUE3NiMXMzBpXegJp0KWSDkq6V9KKsoMxM7PZ\nMYVVZD9wdkQ8J2kt0A0sq1d527ZtY887Ojro6OiY6fjMzFpGX18ffX1909KXImJaOppSENISoCci\nLmig7qPA6yLiqRplMRu2x8ysVUgiIpJOE8yWKSxR5zyHpDOrnl9MJekdlzzMzKy5Sp/CkrQL6AB+\nUdIPga1AGxARcTtwtaT3AM8DR4FryorVzMyOmRVTWNPFU1hmZpNzIkxhmZlZi3ECMTOzJE4gZmaW\nxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkTiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkS\nJxAzM0viBGJmZkmcQMzMLElDCUTSDZLmqeLTkg5IWjPTwZmZ2ezV6BHI70TET4A1wBnAvwc+OmNR\nmZnZrNdoAhm93eFVwF9GxHeqlpmZ2Umo0QSyX1IvlQTyN5J+Hnhx5sIyM7PZThFRXEk6BVgFHI6I\npyX9InBWRDw40wFOhqRoZHvMzKxCEhGRNKPU6BFIACuAzdnr04CXpKzQzMxODI0mkO3ApcA7stc/\nBT4xIxGZmVlLmNNgvddHxEWSHgCIiB9LapvBuMzMbJZr9AjkeUmnUpnKQtJCfBLdzOyk1mgCuQX4\nAvAySf8D6Ac+MmNRmZnZrNfQt7AAJL0aeDOV3398NSKGZjKwFP4WlpnZ5EzlW1i5CUTSvIj4iaQF\ntcoj4qmUlc4UJxAzs8mZya/x7sr+3Q98q+ox+nrKsmtrPSmp7m9KJN0i6RFJByWtmo71mpnZ1OQm\nkIhYl/17TkScW/U4JyLOnaYYPgP8Wr1CSWuBV0TEecD1wCenab0nrCNHjrBv3z6OHDlyXNnQ0BA7\nd+5kaOj4Gci8do2Up8bU1dXFhg0b6Orqqtk2rzyvbO/evWzdupW9e/fW7DdvLPbs2cN1113Hnj17\narbNK8/rtyiuorapf9uiscjrdyb3C2txEVH4AP41ML/q9S8Ab22kbYP9LwEerFP2SeCaqtdDwJl1\n6sbJbteuu6K9fUHMn39RtLcviF277hor27TphoD2gGUB7bFp0+aG2jVSnhrTokXnZDGdF9Aeixcv\nHdc2rzyv7Ior1o4rW7Nm7bh+88bi/PMvHNd25cpV49rmlef1WxRXUdvUv23RWOT1O5P7hc0O2ftm\n2nt3Q5XgYI1lD6SutEZfeQmkB7is6vVXgIvq1J2uMW1JIyMj0d6+IOBQQAQcivb2BTEyMhKDg4PZ\nm8ixMmiPwcHB3HZF/U4lps7OzpoxdXZ2RkTklueV9ff31yzr7++PiMgdi56enpplPT09ERG55Xn9\nRkRuXEVtU/+2RWOR1+9
2016-03-15 12:55:14 +00:00
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c110c18>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# scatter makes a plot of x vs y\n",
"plt.scatter(iris.data[:,0], iris.target)\n",
2016-03-15 15:12:44 +00:00
"plt.xlabel(iris.feature_names[0])\n",
"plt.ylabel('species')"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2016-03-15 15:12:44 +00:00
"execution_count": 8,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEACAYAAABWLgY0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X14VOWZP/DvkxcIBMIEIi8JkBmCWnG7Vn7XirwIARaq\ndksrogUmEnxrUQsIWC+rO5I0trVoUfFllcWaKBRQrC77U9yySDKAK/62aNW6ajeZRJsgihACkRBI\n7t8fGZLJ5IRzZubMmTMz3891zcXknDPPuc9D8uTkOec+txIREBFRYkqJdQBERBQ9HOSJiBIYB3ki\nogTGQZ6IKIFxkCciSmAc5ImIEpjhQV4plaKUOqCU2q6xbppSqtG//oBS6p/NDZOIiMKRFsK2ywF8\nBCCrl/VeEZkTeUhERGQWQ2fySqmRAK4GsOFcm5kSERERmcbodM0jAH4G4FzpsROVUu8ppV5TSo2L\nPDQiIoqU7iCvlPoegEMi8h46zta1ztj/BGC0iHwHwBMAXjU1SiIiCovSe3aNUupXAIoAnAHQD8BA\nAH8QkUXn+IwPwP8RkSNBy/mgHCKiMIhIWFPiumfyInKviIwWkTEA5gN4M3iAV0oNC3h/GTp+eRyB\nBhGx/Wv16tUxj4FxMs54jZFxmv+KRCh313SjlPpJx5gt6wHMU0rdBuA0gJMAfhRRVEREZIqQBnkR\nqQJQ5X//TMDyJwE8aW5oREQUKWa8aigsLIx1CIYwTnPFQ5zxECPAOO1E98KrqTtTSqzcHxFRIlBK\nQcK88Br2nDxRtPl8dfB4ylFf3468vBSUlS2Gy5Uf67ASgtPpRF1dXazDoCD5+fmora01tU2eyZMt\n+Xx1mDXrcVRXlwLIBNCMgoLV2LlzKQd6E/jPDGMdBgXp7f8lkjN5zsmTLXk85QEDPABkorq6FB5P\neQyjIoo/HOTJlurr29E1wJ+ViYaG9liEQxS3OMiTLeXlpQBoDlrajNxcfssShYI/MWRLZWWLUVCw\nGl0DfcecfFnZ4pjFRBSPeOGVbOvs3TUNDe3IzeXdNWZKtAuvdXV1cLlcOHPmDFJS4vfcNRoXXjnI\nEyWhSAZ5O97aWltbi4KCArS2tiI1NTWmsUSCd9cQUUydvbV106a7UFlZik2b7sKsWY/D5zP3nvvf\n/OY3GDlyJLKysnDRRRdh9+7dEBE8+OCDGDt2LM477zzMnz8fjY2NAIBp06YBABwOB7KysrB//36I\nCB544AE4nU4MHz4cixcvRlNTEwDg1KlTuOGGG5CTk4Ps7GxMmDABX331FQCgvLwc48aNQ1ZWFsaO\nHYv169ebemyWs/hJakJEsRfuz6LbXSLACQEk4HVC3O4S02L75JNPZNSoUfLFF1+IiEhdXZ3U1NTI\no48+KhMnTpSGhgZpbW2VJUuWyIIFC0REpLa2VlJSUqS9vb2znWeffVbOP/98qa2tlebmZpk7d64s\nWrRIRESeeeYZmTNnjrS0tEh7e7scOHBAjh8/LiIir7/+uvh8PhER8Xq90r9/f3n33XdNO75z6e3/\nxb88rHGXZ/JEZJgVt7ampqaitbUVH374Ic6cOYPRo0fD5XLhmWeewS9/+UuMGDEC6enpuP/++7Ft\n2za0t7d3TnGc/RcAfv/732PlypXIz89H//798etf/xpbtmxBe3s70tPT8fXXX+PTTz+FUgqXXnop\nBgwYAAC46qqr4HQ6AQBXXHEFZs+ejT179ph2fFbjIE9Ehllxa2tBQQEeffRRlJSUYOjQoVi4cCEO\nHjyIuro6XHPNNRg8eDAGDx6McePGIT09HYcOHYJSPaerGxoakJ/fda0gPz8fp0+fxqFDh3DDDTfg\nu9/9LubPn4+RI0finnvuQVtbGwBgx44dmDhxIoYMGYLs7Gzs2LEDhw8fNu34LBfunwDhvMDpGiJb\nCPdnsaamVgoKVgVM2ZyQgoJVUlNTa3KEHY4fPy4LFiyQG264Qb71rW/JW2+9pbldXV2dpKSkSFtb\nW+eymTNnyr/8y790fv3JJ59Inz59um1z9rPjxo2T3/3ud3Lq1Cnp37+//OEPf+jc7oc//KF4PJ4o\nHF1Pvf2/gNM1RGQFlysfO3cuhdv9MKZPXw23+2HTnyf06aefYvfu3WhtbUWfPn3Qr18/pKamYsmS\nJbj33nvx2WefAQC++uorbN++HQBw3nnnISUlBdXV1Z3tLFiwAI888ghqa2tx4sQJ3HfffZg/fz5S\nUlJQWVmJDz/8EO3t7RgwYADS09M7p4laW1uRk5ODlJQU7NixA3/84x9NO7ZY4FMoiSgkLlc+Nm5c\nHbX2T506hXvuuQcff/wx0tPTMWnSJKxfvx7Dhg2DiGD27Nk4ePAghg4dih/96EeYM2cO+vXrh/vu\nuw+TJ0/GmTNn8MYbb+Cmm27CwYMHMXXqVJw6dQpXXnkl1q1bBwD44osvsGTJEtTX12PAgAGYP38+\nioqKkJKSgnXr1uG6665Da2srvv/97+MHP/hB1I7VCrxPnigJJVoyVKLgffJERBQSTtdQ1NgxM5Io\n2XC6hqKCRT/sjdM19sTpGoobLPpBZA8c5CkqWPSDyB44yFNUsOgHkT3wJ46igkU/iOyBF14palj0\nw7544dWeWDSEiEzBQR4YOHAgPvjgg84nTobD5XLh2WefxYwZM0yJKRqDPO+TJ6KkdPz48ViHYAkO\n8kmKiUoULl+tD561HtQ31SMvKw9lK8vgcrpiHVYPbW1tti0FaGVsvPCahKwq4UaJx1frw6yfzsKm\ngZtQ6arEpoGbMOuns+Cr9Zm2jzVr1uC6667rtmz58uW488470dTUhJtvvhm5ubkYNWoUPB5P5/RG\nRUUFpkyZgpUrVyInJwelpaWorq5GYWEhHA4Hhg4digULFnS2mZKSgpqaGgBAS0sLVq1aBafTiezs\n7M6HmgHA9u3b8Xd/93cYPHgwZsyYgY8//lgz7tbWVtx5553Iy8vDyJEjsWLFCpw+fRoAUFVVhVGj\nRmHNmjUYMWIEbrrpJtP6S1e4zygO5wU+T94WrCjhRvYW7s+ie6lbcC8EJQGveyHupW7TYqurq5PM\nzEw5ceKEiIi0tbXJiBEjZP/+/XLNNdfIbbfdJidPnpSvvvpKJkyYIOvXrxcRkfLycklLS5Mnn3xS\n2tra5OTJk7JgwQL51a9+JSIip06dkn379nXuJyUlRaqrq0VE5Pbbb5fp06fLwYMHpb29Xf7rv/5L\nWltb5ZNPPpHMzEzZtWuXnDlzRtasWSNjx46V06dPi4iI0+mUXbt2iYiIx+ORiRMnyuHDh+Xw4cMy\nadIkuf/++0VEpLKyUtLS0uTnP/+5tLa2SktLi+ax9/b/Aj5PnkLBRCUKV31TPdAnaGEfoKGpwbR9\njB49GuPHj8crr7wCANi1axcyMzPhdDrx+uuv45FHHkFGRgZycnJw5513YvPmzZ2fzcvLw+23346U\nlBRkZGQgPT0ddXV1qK+vR58+fTBp0qTObSWgZOBzzz2HdevWYfjw4VBK4fLLL0d6ejpefPFF/NM/\n/RNmzJiB1NRU3HXXXTh58iTeeuutHnH//ve/x+rVqzFkyBAMGTIEq1evxgsvvNC5PjU1FaWlpUhP\nT0ffvn1N6y89HOSTEBOVKFx5WXlAa9DCViA3K9fU/SxYsKBz8N68eTMWLlyIuro6nD59GiNGjMDg\nwYORnZ2NJUuWdCvNN2rUqG7tPPTQQ2hvb8dll12Gb3/723juued67Ovw4cM4deoUxowZ02NdcAlB\npRRGjRqF+vp6zW1Hjx7d+XV+fj4aGrp++Z133nlIT08PoRfMwZ/qJMREJQpX2coyFPy5oGugbwUK\n/lyAspVlpu7nuuuuQ2VlJerr6/HKK6/A7XZj1KhRyMjIwNdff40jR47g6NGjaGxsxPvvv9/5ueBa\nr0OHDsX69etRX1+Pp59+GrfffnvnPPxZOTk5yMjI6FZV6qzc3FzU1XW/VvX5559j5MiRutvW1dUh\nN7frl59WHVorcJBPQla
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f8c0a8748>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot the distribution of the dataset\n",
"names = set(iris.target)\n",
"\n",
"# x and y are all the samples from column 0 (sepal_length) and 1 (sepal_width) respectively\n",
"x,y = iris.data[:,0], iris.data[:,1]\n",
"\n",
"for name in names:\n",
" cond = iris.target == name\n",
" plt.plot(x[cond], y[cond], linestyle='none', marker='o', label=iris.target_names[name])\n",
"\n",
"plt.legend(numpoints=1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, the Setosa class seems to be linear separable with these two features.\n",
"\n",
"Another nice visualisation is given below."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.text.Text at 0x7f5f5e1f1320>"
2016-03-15 12:55:14 +00:00
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEPCAYAAAC+35gCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXd4VFX6+D9nZjItjZIQCL33DqEKAVFEVlQsYENxdwEV\nu/5s64plZfW7dhQsqOi6InZBpBN67whCKAlNegnJTNrM+/vjDjDJJDAhk4SQ83me++TOue89970l\n973nPe95jxIRNBqNRqMJBlNZK6DRaDSa8oM2GhqNRqMJGm00NBqNRhM02mhoNBqNJmi00dBoNBpN\n0GijodFoNJqgKXGjoZRKUUptUEqtU0qtLETmXaVUslJqvVKqXUnrpNFoNKWNUqqGUmrKRez3kVKq\n2QVkRiql7rx47YqgT0mP01BK7QI6isiJQrYPAEaLyEClVBfgHRHpWqJKaTQazSWCUsosIp6y1iNY\nSsM9pS5wnOuBLwBEZAUQrZSKKwW9NBqNpkRQSo1VSt3v9/sFpdTjSqlNvt93K6V+VkrNBeYogw+U\nUluUUjOVUr8qpQb7ZOcrpTr41k8rpV7xeWWWKqVi/ep/zLfeUCk12yezWilVXykVrpSa4/u9QSk1\n6GLPrTSMhgCzlVKrlFJ/L2B7TWCv3+/9vjKNRqMpr3wD3Or3+1ZgOcb78AztgcEi0gcYDNQRkRbA\nMKBbIfWGA0tFpB2wCCjonfoV8J5PpjvwJ+AGbhCRTkBf4I2LPTHLxe5YBHqIyJ8+izhbKbVVRBaX\nwnE1Go2mTBCR9UqpWKVUdaAacBzYl09stoic8q33BL717XtIKTW/kKqzRGS6b30N0M9/o1IqAogX\nkV98dWX7yi3AWKVUL8ALxCulqonI4aKeW4kbDRH50/f3iFLqRyAB8Dca+4Hafr9r+cryoJTSSbI0\nGk3QiIgqzv6VlDr7Rg+CQyJSPV/Zt8AtQHWMlkd+Mi5CrRy/dQ8Fv8MLOu87gBigvYh4lVK7AftF\nHL9k3VNKKafP8qGUCgeuBjbnE/sFozmGUqorcFJEDhVUn4iU+vLCCy9UmONWpHPVx718jykSmu/L\nU8ArQS5AQf2wU4ChwE34WhHnYQlwk69vIw5ILETuvIZQRNKBvUqp6wGUUlallAOIBg6LYTD6AHUv\noE+hlHRLIw740ddKsABficgspdRIQETkIxGZrpS6Vim1A8PyDi9hnTQajSYowoqxr4hsUUpFAvvE\ncDmd70X9PUZfw+8YfbxrMOwW5O0HCcYiDgM+VEq9BGRjtHa+AqYqpTYAq4GtRToZP0rUaIjIbiBg\n3IWIfJjv9+iS1EOj0WguhuK+IEWkjd96KtDGtz4JmOS3TZRST4pIhlKqCrAC2OTb1tdPLspv/XsM\nY4OIvOhXvgO4sgB1uhfzdIDS6Qgv1yQmJlaY41akc9XHvXyPGUocpXu4aUqpShgNnJfkIjqpS4MS\nH9wXKpRSUl501Wg0ZYtSCilmR7hSSj4OUvbvFL/jvbygWxoajUZTCPoFGYi+JhqNRlMIxekIv1zR\nRkOj0WgKQb8gA9HXRKPRaApBtzQC0UZDo9FoCkEbjUC00dBoNJpCKOWQ23KBNhoajUZTCPoFGYi+\nJhqNRlMI2j0ViDYaGo1GUwj6BRmIviYajUZTCLqlEYg2GhqNRlMI+gUZiL4mGo1GUwi6pRGINhoa\njUZTCDrkNhBtNDQajaYQdEsjEG00NBqNphD0CzIQfU00Go2mEMKCfUPmlqgalxTaaGg0Gk0hWLTR\nCEAbDY1GoymEMHNZa3DpoY2GRqPRFELQLY0KhL4kGo1GUwhhtrLW4NJDGw2NRqMpDP2GDMBUGgdR\nSpmUUmuVUr8UsK23Uuqkb/tapdQ/SkMnjUajuSCWIJcKRGmd7sPAFiCqkO0LRWRQKemi0Wg0wVHB\nDEIwlHhLQylVC7gW+OR8YiWth0aj0RQZc5BLBaI03FNvAU8Cch6Zbkqp9UqpX5VSLUpBJ41Go7kw\n2j0VQImerlJqIHBIRNYrpRIpuEWxBqgjIi6l1ADgJ6BJQfWNGTPm7HpiYiKJiYmhVlmj0ZRDkpKS\nSEpKCn3FOnoqACVyvgZAMStX6lXgTozxkg4gEvhBRIadZ5/dQEcROZ6vXEpSV41Gc/mglEJEiuX2\nVkqJdApSdjXFPl55oUTdUyLyrIjUEZEGwFBgXn6DoZSK81tPwDBkx9FoNJqyRrunAiiVkNv8KKVG\nKqVG+H7erJTarJRaB7wNDCkLnTTBM3/+fK65ZhDNmrXl7rv/yrZt28paJY2mZNAd4QGUqHsqlGj3\n1KXBhAkf8vjj/8Dl6gZUw2zejd2+lvnzZ9G5c+eyVk+jAULonuodpOyCiuOe0kZDEzQZGRlUqxaP\ny3UXEOu3ZR2dOx9j5cpFZaWaRpOHkBmNK4OUnVtxjEaZuKc05ZOVK1disVQjr8EAaM2aNcvJysoq\nC7U0mpJD92kEUMFOV1McwsPD8XrdGENu/D+qsjCbLVh0SlDN5YYOuQ1AtzQ0QdOpUyeiosIwMsKc\nQQgLW8KNNw7GbK5gPYKayx/d0giggp2upjiYTCZ+/vlbrrzyGjye7WRkVCIycj9xcVbGjXu7rNXT\naEKP/g4KQHeEa4pMWloa33zzDXv27KF9+/YMGjRIu6Y0lxQh6wi/LUjZrytOR7g2GhqN5rIjZEbj\nriBlv6w4RkN/Hmo0Gk1haPdUANpoaDQaTWHoN2QA+pJoNBpNYdjLWoFLD200NACICFlZWdhsNpSq\nEK5ZjebCaPdUAHqcRgVHRPjPf94kNjaeiIgo4uJq895749BBBxoNepxGAVSw09Xk54UXXuKNNz7D\n5boBqM6RI/t5+unXcLncPPXUk2WtnkZTtug3ZAA65LYC43K5iI2tgct1L1DJb8sRIiO/5ujRP7Fa\nrWWlnkZz0YQs5PbZIGVfrTght9o9VYHZvXs3ZnMkeQ0GQCxer4kDBw6UhVoazaWDdk8FUMFOV+NP\n9erVyck5BWSSN0wkA48nk5iYmDLSTKO5RNBvyAB0S6MCU7VqVQYO/As222wgx1eajd0+myFDhhAR\nEVGW6mk0ZY8tyKUCofs0yjHHjh3j448/YdGi5TRqVJ/77x9J06ZNi1TH6dOnufXWO0hKWojNVoOs\nrAP079+f//3vc5xOZwlprtGULCHr03gjSNnHK06fhjYa5ZSdO3fSpUsPXK5auN11sViOYbWu5+uv\nv2DQoEFFrm/37t3s3LmTJk2aUKdOnRLQWKMpPUJmNN4JUvZhbTQuObTRyEv//tcxZ04WXm8Pv9K9\nREf/zOHD+3XUk6ZCEzKjMS5I2dEVx2joPo1ySG5uLnPnzsTr7ZRvS21EIlm+fHmZ6KXRXHbo6KkA\nKtjpajQaTRHQb8gAdEujHGKxWOjX7xpMplX5tuzBZEqnW7duZaKXRnPZYQ5yqUCUih1VSpmA1cA+\nEQnopVVKvQsMADKAe0RkfWnoVZ754IO36dKlBxkZx3G76xAWdoywsI18+eV/CQsLKzO9Nm3axIED\nB2jXrh1xcXFlpodGExJ0ltsASqvx9TCwBYjKv0EpNQBoKCKNlVJdgAlA11LSq9zSoEEDtm37nYkT\nP2Xx4hU0atSF++77lEaNGpWJPvv27WPgwBvZsSOFsLBYMjP3cs89d/P+++9gNlewTzHN5YN2TwVQ\n4tFTSqlawGfAv4DH8rc0lFITgPki8o3v91YgUUQO5ZPT0VOXKCJC69Yd+OOPKng8PTG8nm6czu95\n/vm/8/TT/6+sVdRUMEIWPfVDkLKDdfRUKHkLeBIo7I1fE9jr93u/r0xTTlizZg0pKQf9DAaAA5fr\nSt58892yVE2jKR46eiqAEj1dpdRA4JCIrFdKJQLFssRjxow5u56YmEhiYmJxqtOEiH379mE2VyPw\nGySWY8cOloVKmgpGUlISSUlJoa+4ghmEYChR95RS6lXgTiAXcACRwA8iMsxPJr976g+gt3ZPlR9S\nUlJo3rwtmZmjAf9Bhdt
"text/plain": [
2016-03-15 15:12:44 +00:00
"<matplotlib.figure.Figure at 0x7f5f5e2412b0>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x_index = 0\n",
"y_index = 1\n",
"formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])\n",
"plt.scatter(iris.data[:, x_index], iris.data[:, y_index], s=40,\n",
"c=iris.target)\n",
"plt.colorbar(ticks=[0, 1, 2], format=formatter)\n",
"plt.xlabel(iris.feature_names[x_index])\n",
"plt.ylabel(iris.feature_names[y_index])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we can also check that the Setosa class seems to be linear separable.\n",
"\n",
"Students interested in practicing advanced visualisations can check [Advanced visualisation notebook](2_3_1_Advanced_Visualisation.ipynb).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](http://proquest.safaribooksonline.com/book/programming/python/9781783981960), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
"* [Iris dataset visualisation notebook](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook)\n",
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}