mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-05 07:31:41 +00:00
345 lines
74 KiB
Plaintext
345 lines
74 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Course Notes for Learning Intelligent Systems"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Table of Contents\n",
|
|
"* [Visualisation](#Visualisation)\n",
|
|
"* [Exploratory visualisation](#Exploratory-visualisation)\n",
|
|
"* [References](#References)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Visualisation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Exploratory visualisation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This section covers different ways to inspect the distribution of samples per feature.\n",
|
|
"\n",
|
|
"First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
|
|
"\n",
|
|
"A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n",
|
|
"\n",
|
|
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
|
|
"\n",
|
|
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from sklearn import datasets\n",
|
|
"iris = datasets.load_iris()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# library for displaying plots\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"# display plots in the notebook\n",
|
|
"# if this is not set, you will not see the graphic here\n",
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFk1JREFUeJzt3Xm4JXV95/H3hwYEZRNo2waERkUS3LEh7kHckuDCKEISh4AyEjMmg6KOxHEhYhRkMBEziRA1tIkGEBdaxCBBFqOodAORRVBCYAIBaUeURUWW7/xRdYvrpft29e2uc27ffr+e5zyn9vqe7rrnc2r7VaoKSZIANhp3AZKk2cNQkCR1DAVJUsdQkCR1DAVJUsdQkCR1DAVJUsdQkCR1DAVJUmfjcRfQx/bbb1+LFi0adxmStF5Zvnz5j6pq/prMs16EwqJFi1i2bNm4y5Ck9UqSG9d0Hg8fSZI6hoIkqWMoSJI6hoIkqWMoSJI6g159lOQG4E7gfuC+qlqcZFvgNGARcANwYFXdPmQdkqR+RrGn8IKqelpVLW77jwLOq6rdgPPafknSLDCOw0evBJa03UuA/cdQgyRpJYYOhQK+mmR5ksPbYQuq6pa2+1ZgwcA1SJJ6GvqO5udW1c1JHgWcm+SaySOrqpLUymZsQ+RwgJ133nnGBSw66ssznndt3HDsfmNZr0ZrXNsXuI2N0ob0PTLonkJV3dy+3wZ8Adgb+GGShQDt+22rmPfkqlpcVYvnz1+jpjskSTM0WCgkeUSSLSe6gZcAVwJLgUPayQ4BzhyqBknSmhny8NEC4AtJJtbzmar6pySXAKcnOQy4EThwwBokSWtgsFCoquuBp65k+P8DXjjUeiVJM+cdzZKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoMHgpJ5iW5LMlZbf+uSb6d5LokpyXZdOgaJEn9jGJP4Qjge5P6jwP+oqoeD9wOHDaCGiRJPQwaCkl2AvYDPt72B9gXOKOdZAmw/5A1SJL6G3pP4S+B/wk80PZvB/ykqu5r+28Cdhy4BklST4OFQpKXAbdV1fIZzn94kmVJlq1YsWIdVydJWpkh9xSeA7wiyQ3AqTSHjT4CbJNk43aanYCbVzZzVZ1cVYuravH8+fMHLFOSNGGwUKiqP62qnapqEfC7wNeq6rXA+cAB7WSHAGcOVYMkac2M4z6FdwBHJrmO5hzDJ8ZQgyRpJTZe/SRrr6ouAC5ou68H9h7FeiVJa8Y7miVJnTUKhSQbJdlqqGIkSeO12lBI8pkkWyV5BHAlcHWStw9fmiRp1PrsKexRVXfQ3Hn8FWBX4OBBq5IkjUWfUNgkySY0obC0qu4FatiyJEnj0CcUTgJuAB4BXJRkF+COIYuSJI3Hai9JraoTgRMnDboxyQuGK0mSNC59TjQvSPKJJF9p+/eguRNZkjTH9Dl8dApwDrBD2/994M1DFSRJGp8+obB9VZ1O2/x12+z1/YNWJUkaiz6hcHeS7WivOEryTOCng1YlSRqLPm0fHQksBR6X5BvAfB5s5VSSNIf0ufro0iS/CewOBLi2vVdBkjTH9Ln66E3AFlV1VVVdCWyR5L8PX5okadT6nFN4Q1X9ZKKnqm4H3jBcSZKkcekTCvOSZKInyTxg0+FKkiSNS58Tzf8EnJbkpLb/D9thkqQ5pk8ovIMmCP6o7T8X+PhgFUmSxqbP1UcPAH/TviRJc9hqQyHJc4CjgV3a6QNUVT122NIkSaPW5/DRJ4C3AMuxeQtJmtP6hMJPq+org1ciSRq7PqFwfpLjgc8D90wMrKpLB6tKkjQWfULhN9r3xZOGFbDvui9HkjROfa4+8ilrkrSB6LOnQJL9gCcCm00Mq6r3DVWUJGk8+jSI9zHgIOBPaC5HfQ3N5amSpDmmT9tHz66qPwBur6o/A54FPGHYsiRJ49AnFH7evv8syQ7AvcDC4UqSJI1Ln3MKZyXZBjgeuJTmyiPbPpKkOahPKHyoqu4BPpfkLJqTzb8YtixJ0jj0OXx08URHVd1TVT+dPEySNHesck8hyaOBHYHNkzyd5sojgK2Ah4+gNknSiE13+OilwKHATsAJPBgKdwLvXN2Ck2wGXAQ8rF3PGVX13iS7AqcC29E0sndwVf1yph9AkrTurDIUqmoJsCTJq6vqczNY9j3AvlV1V5JNgH9J8hXgSOAvqurU9h6Iw/BZDZI0K/Q5p7BTkq3S+HiSS5O8ZHUzVeOutneT9jXRZtIZ7fAlwP4zKVyStO71CYXXV9UdwEtoDvkcDBzbZ+FJ5iW5HLiN5jGe/wb8pKruaye5iea8hSRpFugTChPnEn4H+FRVXTVp2LSq6v6qehrNeYm9gV/rW1iSw5MsS7JsxYoVfWeTJK2FPqGwPMlXaULhnCRbAg+syUqq6ifA+TRNZGyTZOJcxk7AzauY5+SqWlxVi+fPn78mq5MkzVCfUDgMOArYq6p+BmwKvG51MyWZ394JTZLNgRcD36MJhwPayQ4BzpxB3ZKkAfR5nsIDSX4I7DHpF34fC2muXppHEz6nV9VZSa4GTk3yfuAymmdAS5JmgdV+ySc5jqbp7KuB+9vBRXMPwipV1XeBp69k+PU05xckSbNMn1/++wO7t+0fSZLmsD7nFK6nucdAkjTH9dlT+BlweZLzaO5SBqCq/sdgVUmSxqJPKCxtX5KkOa7P1UdLRlGIJGn8pms6+/SqOjDJFTRXG/2KqnrKoJVJkkZuuj2FI9r3l42iEEnS+E3XdPYt7fuNoytHkjROfS5JlSRtIAwFSVJnlaHQ3pcw0cyFJGkDMN2J5oVJng28IsmpTHmGQlVdOmhlkqSRmy4U3gO8m+aZBx+eMm7isZqSpDlkuquPzgDOSPLuqjpmhDVJksakzx3NxyR5BfD8dtAFVXXWsGVJksZhtVcfJfkgzY1sV7evI5J8YOjCJEmj16dBvP2Ap1XVAwBJltA8Me2dQxYmSRq9vvcpbDOpe+shCpEkjV+fPYUPApclOZ/mstTnA0cNWpUkaSz6nGj+xyQXAHu1g95RVbcOWpUkaSz67ClMNI7ng3YkaY6z7SNJUsdQkCR1pg2FJPOSXDOqYiRJ4zVtKFTV/cC1SXYeUT2SpDHqc6L5kcBVSb4D3D0xsKpeMVhVkqSx6BMK7x68CknSrNDnPoULk+wC7FZV/5zk4cC84UuTJI1anwbx3gCcAZzUDtoR+OKQRUmSxqPPJalvAp4D3AFQVT8AHjVkUZKk8egTCvdU1S8nepJsTPPkNUnSHNMnFC5M8k5g8yQvBj4LfGnYsiRJ49AnFI4CVgBXAH8InA28a3UzJXlMkvOTXJ3kqiRHtMO3TXJukh+0749cmw8gSVp3+lx99ED7YJ1v0xw2uraq+hw+ug94a1VdmmRLYHmSc4FDgfOq6tgkR9GEzjtm/AkkSetMn6uP9gP+DTgR+CvguiS/vbr5quqWqrq07b4T+B7NlUuvBJa0ky0B9p9Z6ZKkda3PzWsnAC+oqusAkjwO+DLwlb4rSbIIeDrN3saCtilugFuBBWtQryRpQH3OKdw5EQit64E7+64gyRbA54A3V9Udk8e1h6FWeigqyeFJliVZtmLFir6rkySthVXuKSR5Vdu5LMnZwOk0X+CvAS7ps/Akm9AEwqer6vPt4B8mWVhVtyRZCNy2snmr6mTgZIDFixd7CawkjcB0h49ePqn7h8Bvtt0rgM1Xt+AkAT4BfK+qPjxp1FLgEODY9v3MNSlYkjScVYZCVb1uLZf9HOBg4Iokl7fD3kkTBqcnOQy4EThwLdcjSVpHVnuiOcmuwJ8AiyZPv7qms6vqX4CsYvQL+5coSRqVPlcffZHmMNCXgAeGLUeSNE59QuEXVXXi4JVIksauTyh8JMl7ga8C90wMnLgxTZI0d/QJhSfTnDDelwcPH1XbL0maQ/qEwmuAx05uPluSNDf1uaP5SmCboQuRJI1fnz2FbYBrklzCr55TmPaSVEnS+qdPKLx38CokSbNCn+cpXDiKQiRJ49fnjuY7ebAl002BTYC7q2qrIQuTJI1enz2FLSe620buXgk8c8iiJEnj0efqo041vgi8dKB6JElj1Ofw0asm9W4ELAZ+MVhFkqSx6XP10eTnKtwH3EBzCEmSNMf0Oaewts9VkCStJ6Z7HOd7ppmvquqYAeqRJI3RdHsKd69k2COAw4DtAENBkuaY6R7HecJEd5ItgSOA1wGnAiesaj5J0vpr2nMKSbYFjgReCywB9qyq20dRmCRp9KY7p3A88CrgZODJVXXXyKqSJI3FdDevvRXYAXgX8J9J7mhfdya5YzTlSZJGabpzCmt0t7Mkaf3nF78kqWMoSJI6hoIkqWMoSJI6hoIkqWMoSJI6hoIkqWMoSJI6hoIkqWMoSJI6hoIkqTNYKCT5ZJLbklw5adi2Sc5N8oP2/ZFDrV+StOaG3FM4BfitKcOOAs6rqt2A89p+SdIsMVgoVNVFwI+nDH4lzcN6aN/3H2r9kqQ1N+pzCguq6pa2+1ZgwaomTHJ4kmVJlq1YsWI01UnSBm5sJ5qrqoCaZvzJVbW4qhbPnz9/hJVJ0oZr1KHwwyQLAdr320a8fknSNEYdCkuBQ9ruQ4AzR7x+SdI0hrwk9R+Bi4Hdk9yU5DDgWODFSX4AvKjtlyTNEqt8RvPaqqrfW8WoFw61TknS2vGOZklSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHXGEgpJfivJtUmuS3LUOGqQJD3UyEMhyTzg/wC/DewB/F6SPUZdhyTpocaxp7A3cF1VXV9VvwROBV45hjokSVOMIxR2BP5jUv9N7TBJ0phtPO4CViXJ4cDhbe9dSa6d4aK2B360bqrqL8eNeo0ak7FsX+A2tiHIcWu9fe2ypjOMIxRuBh4zqX+ndtivqKqTgZPXdmVJllXV4rVdjrQybl8a0ji2r3EcProE2C3Jrkk2BX4XWDqGOiRJU4x8T6Gq7kvyx8A5wDzgk1V11ajrkCQ91FjOKVTV2cDZI1rdWh+Ckqbh9qUhjXz7SlWNep2SpFnKZi4kSZ05FQpJDk2yw7jr0NyW5H1JXjSD+fZJctYQNWl2SrJDkjNmMN/HV9fSQ5I3JvmDmVe3iuXOpcNHSS4A3lZVy8Zdi9ZvSULz9/HAOlzmPjTb58t6Tr9xVd23rtav2WM2/9/O+j2FJI9I8uUk/5rkyiQHJXlGkguTLE9yTpKFSQ4AFgOfTnJ5ks2TvDDJZUmuSPLJJA9rl3lskquTfDfJ/26HvTzJt9vp/znJgnF+bq0b7f/1myb1H53kbUnenuSSdhv4s3bcorahxk8BVwKPSXJKu91dkeQt7XSntNsbSfZK8s12+/xOki2TbJbk79p5LkvygpXUtW2SL7br/1aSp0yq7++TfAP4+xH8E2kdmWZbu7LtPzTJ0iRfA85LslGSv05yTZJzk5w9abu6IMnitvuuJH/ebmPfmvhumlh+2/349nvrX5NcmuRxSbZIcl7bf0WSfs0JVdWsfgGvBv52Uv/WwDeB+W3/QTSXtQJcACxuuzejaU7jCW3/p4A3A9sB1/LgXtI27fsjJw37b8AJ4/7svtbJ9vN04MJJ/VcDh9Bc1RGaH0ZnAc8HFgEPAM9sp30GcO6keSe2lVOAA4BNgeuBvdrhW9Fc0ffWSdvkrwH/t90e9wHOaod/FHhv270vcHnbfTSwHNh83P92vtbJtvY84Mq2/1CaZn22bfsPoLkKcyPg0cDtwAHtuMnfZQW8vO3+EPCuSdvK29rubwP/pe3eDHh4uy1u1Q7bHrhu4jtuutesbeZikiuAE5IcR/PHezvwJODcZg+fecAtK5lvd+Dfq+r7bf8S4E3AXwG/AD7RHt+dOMa7E3BakoU0f+z/PszH0ShV1WVJHtWea5pPs/08GXgJcFk72RbAbjRf3jdW1bfa4dcDj03yUeDLwFenLH534JaquqRd1x0ASZ5L86VPVV2T5EbgCVPmfS7NDx6q6mtJtkuyVTtuaVX9fO0/vUZpFdvaf0yZ7Nyq+nHb/Vzgs9Ucorw1yfmrWPQvefB7ajnw4skjk2wJ7FhVX2jr+EU7fBPgA0meT/NjZ0dgAXDrdJ9j1odCVX0/yZ7A7wDvB74GXFVVz5rh8u5LsjfwQpqk/mOaX2ofBT5cVUvbY79Hr4PyNTt8lub/+tHAaTTtwXywqk6aPFGSRcDdE/1VdXuSpwIvBd4IHAi8fgT13r36STRLTd3WpprJ/+291f7cB+6n//f2a2nC6RlVdW+SG2j2Iqa1PpxT2AH4WVX9A3A88BvA/CTPasdvkuSJ7eR3Alu23dcCi5I8vu0/GLgwyRbA1tXcQPcW4Knt+K15sA2mQ4b8TBq502iaUzmA5o/2HOD17bZAkh2TPGrqTEm2Bzaqqs8B7wL2nDLJtcDCJHu102+ZZGPg6zR/kCR5ArBzO+1kk6fZB/jRxJ6G1mtTt7XpfAN4dXtuYQHN4cU1VlV3Ajcl2R8gycOSPJzmO+22NhBeQM/G8Wb9ngLNrv7xSR4A7gX+CLgPODHJ1jSf4S+Bq2iO9X4syc+BZwGvAz7b/qFeAnwM2BY4M8lmNMeUj2zXc3Q77e00eyO7juTTaXBVdVW7i31zVd0C3JLk14GL20OQdwH/leZX2GQ7An+XZOLH059OWe4vkxwEfDTJ5sDPgRcBfw38TZIraLbVQ6vqnnZdE44GPpnku8DP8IfInDB1W2v3PlflczRHLK6mOcx0KfDTGa76YOCkJO+j+Z58DfBp4EvtdrgMuKbPgubUJamStD5JskVV3ZVkO+A7wHOqatpj/kNbH/YUJGmuOivJNjQXtxwz7kAA9xQkSZPM+hPNkqTRMRQkSR1DQZLUMRS0QUnyzZmM67lsW0HVes9Q0Aalqp49dVh7H8tKx0kbGkNBG5Qkd7Xv+yT5epKlNDcPTR63MMlFaVrbvTLJ81aynIe0jjpl/N5JLk7TSuo3k+zeDn9iO/3laVpI3S0raQl48H8IaRW8T0Ebsj2BJ1XV1MYPfx84p6r+PMk8mhYnO0k2pWnO4KCquqRtyG5qA3bXAM9r29p6EfABmgbw3gh8pKo+3S5nHk27Xv9ZVfu1y9963X5MqT9DQRuy76wkEKBpEuWTbSuTX6yqy6eMX1XrqJOn2RpYkmQ3mqaPN2mHXwz8ryQ7AZ+vqh+0zRB0LQFX1dfX0eeT1piHj7QhW2mLlVV1Ec3zFW4GTsnMHnl4DHB+VT0JeDlt65RV9RngFTR7Fmcn2bdt3n1Pmmbi35/kPTNYn7ROuKcgTZFkF+CmqvrbNE/r25PmIU0TutZR28NHW/LQw0eTW909dNKyHwtcX1UnJtkZeEqSa4AfV9U/JPkJzUOepLEwFKSH2gd4e5J7aVpQ/ZU9hWlaR53sQzSHj95F84CeCQcCB7fLvpXmXMNePLQlYGksbPtIktTxnIIkqWMoSJI6hoIkqWMoSJI6hoIkqWMoSJI6hoIkqWMoSJI6/x+G50a49C8ZmQAAAABJRU5ErkJggg==\n",
|
|
"text/plain": [
|
|
"<matplotlib.figure.Figure at 0x7fd9e04f14e0>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Plot histogram, the default is 10 bins\n",
|
|
"plt.hist(iris.target)\n",
|
|
"plt.ylabel('Number of instances')\n",
|
|
"plt.xlabel('iris class')\n",
|
|
"plt.xticks(range(len(iris.target_names)), iris.target_names);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"As can be seen, we have the same distribution of samples for every class.\n",
|
|
"The next step is to see the distribution of the features"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# This is a reminder of the name and index of each feature\n",
|
|
"print(iris.feature_names)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"['setosa' 'versicolor' 'virginica']\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# A reminder of feature names and indexes\n",
|
|
"print(iris.target_names)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"A [**scatter plot**](https://en.wikipedia.org/wiki/Scatter_plot) (*gráfico de dispersión*) displays the value of typically two variables for a set of data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Text(0,0.5,'iris class')"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<matplotlib.figure.Figure at 0x7fd9e04f1470>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# scatter makes a plot of x vs y\n",
|
|
"plt.scatter(iris.data[:,0], iris.target)\n",
|
|
"plt.yticks(range(len(iris.target_names)), iris.target_names);\n",
|
|
"plt.xlabel(iris.feature_names[0])\n",
|
|
"plt.ylabel('iris class')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<matplotlib.figure.Figure at 0x7fd9de43e400>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Plot the distribution of the dataset\n",
|
|
"names = set(iris.target)\n",
|
|
"\n",
|
|
"# x and y are all the samples from column 0 (sepal_length) and 1 (sepal_width) respectively\n",
|
|
"x,y = iris.data[:,0], iris.data[:,1]\n",
|
|
"\n",
|
|
"for name in names:\n",
|
|
" cond = iris.target == name\n",
|
|
" plt.plot(x[cond], y[cond], linestyle='none', marker='o', label=iris.target_names[name])\n",
|
|
"\n",
|
|
"plt.legend(numpoints=1)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see, the Setosa class seems to be linearly separable with these two features.\n",
|
|
"\n",
|
|
"Another nice visualisation is given below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<matplotlib.figure.Figure at 0x7fd9de38a278>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"x_index = 0\n",
|
|
"y_index = 1\n",
|
|
"formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])\n",
|
|
"plt.scatter(iris.data[:, x_index], iris.data[:, y_index], s=40,\n",
|
|
"c=iris.target)\n",
|
|
"plt.colorbar(ticks=[0, 1, 2], format=formatter)\n",
|
|
"plt.xlabel(iris.feature_names[x_index])\n",
|
|
"plt.ylabel(iris.feature_names[y_index]);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This alternate visualisation also suggests that the Setosa class seems to be linearly separable.\n",
|
|
"\n",
|
|
"Students interested in practicing advanced visualisations can check [Advanced visualisation notebook](2_3_1_Advanced_Visualisation.ipynb).\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# References"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
|
|
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
|
|
"* [Mastering Pandas](http://proquest.safaribooksonline.com/book/programming/python/9781783981960), Femi Anthony, Packt Publishing, 2015.\n",
|
|
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
|
|
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
|
|
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
|
|
"* [Iris dataset visualisation notebook](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook)\n",
|
|
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Licence\n",
|
|
"\n",
|
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
|
"\n",
|
|
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|