2016-03-15 12:55:14 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Visualisation](#Visualisation)\n",
"* [Exploratory visualisation](#Exploratory-visualisation)\n",
"* [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2016-03-28 10:26:20 +00:00
"This section covers different ways to inspect the distribution of samples per feature.\n",
"\n",
2017-12-11 17:12:06 +00:00
"First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
2016-03-28 10:26:20 +00:00
"\n",
2017-12-11 17:12:06 +00:00
"A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n",
2016-03-28 10:26:20 +00:00
"\n",
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"\n",
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2017-04-20 14:07:10 +00:00
"execution_count": 1,
2018-02-27 15:14:18 +00:00
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [],
"source": [
"from sklearn import datasets\n",
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 2,
"metadata": {},
"outputs": [],
2016-03-15 12:55:14 +00:00
"source": [
"# library for displaying plots\n",
"import matplotlib.pyplot as plt\n",
"# display plots in the notebook\n",
"# if this is not set, you will not see the graphic here\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 3,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"data": {
2018-02-27 15:14:18 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFk1JREFUeJzt3Xm4JXV95/H3hwYEZRNo2waERkUS3LEh7kHckuDCKEISh4AyEjMmg6KOxHEhYhRkMBEziRA1tIkGEBdaxCBBFqOodAORRVBCYAIBaUeURUWW7/xRdYvrpft29e2uc27ffr+e5zyn9vqe7rrnc2r7VaoKSZIANhp3AZKk2cNQkCR1DAVJUsdQkCR1DAVJUsdQkCR1DAVJUsdQkCR1DAVJUmfjcRfQx/bbb1+LFi0adxmStF5Zvnz5j6pq/prMs16EwqJFi1i2bNm4y5Ck9UqSG9d0Hg8fSZI6hoIkqWMoSJI6hoIkqWMoSJI6g159lOQG4E7gfuC+qlqcZFvgNGARcANwYFXdPmQdkqR+RrGn8IKqelpVLW77jwLOq6rdgPPafknSLDCOw0evBJa03UuA/cdQgyRpJYYOhQK+mmR5ksPbYQuq6pa2+1ZgwcA1SJJ6GvqO5udW1c1JHgWcm+SaySOrqpLUymZsQ+RwgJ133nnGBSw66ssznndt3HDsfmNZr0ZrXNsXuI2N0ob0PTLonkJV3dy+3wZ8Adgb+GGShQDt+22rmPfkqlpcVYvnz1+jpjskSTM0WCgkeUSSLSe6gZcAVwJLgUPayQ4BzhyqBknSmhny8NEC4AtJJtbzmar6pySXAKcnOQy4EThwwBokSWtgsFCoquuBp65k+P8DXjjUeiVJM+cdzZKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoMHgpJ5iW5LMlZbf+uSb6d5LokpyXZdOgaJEn9jGJP4Qjge5P6jwP+oqoeD9wOHDaCGiRJPQwaCkl2AvYDPt72B9gXOKOdZAmw/5A1SJL6G3pP4S+B/wk80PZvB/ykqu5r+28Cdhy4BklST4OFQpKXAbdV1fIZzn94kmVJlq1YsWIdVydJWpkh9xSeA7wiyQ3AqTSHjT4CbJNk43aanYCbVzZzVZ1cVYuravH8+fMHLFOSNGGwUKiqP62qnapqEfC7wNeq6rXA+cAB7WSHAGcOVYMkac2M4z6FdwBHJrmO5hzDJ8ZQgyRpJTZe/SRrr6ouAC5ou68H9h7FeiVJa8Y7miVJnTUKhSQbJdlqqGIkSeO12lBI8pkkWyV5BHAlcHWStw9fmiRp1PrsKexRVXfQ3Hn8FWBX4OBBq5IkjUWfUNgkySY0obC0qu4FatiyJEnj0CcUTgJuAB4BXJRkF+COIYuSJI3Hai9JraoTgRMnDboxyQuGK0mSNC59TjQvSPKJJF9p+/eguRNZkjTH9Dl8dApwDrBD2/994M1DFSRJGp8+obB9VZ1O2/x12+z1/YNWJUkaiz6hcHeS7WivOEryTOCng1YlSRqLPm0fHQksBR6X5BvAfB5s5VSSNIf0ufro0iS/CewOBLi2vVdBkjTH9Ln66E3AFlV1VVVdCWyR5L8PX5okadT6nFN4Q1X9ZKKnqm4H3jBcSZKkcekTCvOSZKInyTxg0+FKkiSNS58Tzf8EnJbkpLb/D9thkqQ5pk8ovIMmCP6o7T8X+PhgFUmSxqbP1UcPAH/TviRJc9hqQyHJc4CjgV3a6QNUVT122NIkSaPW5/DRJ4C3AMuxeQtJmtP6hMJPq+org1ciSRq7PqFwfpLjgc8D90wMrKpLB6tKkjQWfULhN9r3xZOGFbDvui9HkjROfa4+8ilrkrSB6LOnQJL9gCcCm00Mq6r3DVWUJGk8+jSI9zHgIOBPaC5HfQ3N5amSpDmmT9tHz66qPwBur6o/A54FPGHYsiRJ49AnFH7evv8syQ7AvcDC4UqSJI1Ln3MKZyXZBjgeuJTmyiPbPpKkOahPKHyoqu4BPpfkLJqTzb8YtixJ0jj0OXx08URHVd1TVT+dPEySNHesck8hyaOBHYHNkzyd5sojgK2Ah4+gNknSiE13+OilwKHATsAJPBgKdwLvXN2Ck2wGXAQ8rF3PGVX13iS7AqcC29E0sndwVf1yph9AkrTurDIUqmoJsCTJq6vqczNY9j3AvlV1V5JNgH9J8hXgSOAvqurU9h6Iw/BZDZI0K/Q5p7BTkq3S+HiSS5O8ZHUzVeOutneT9jXRZtIZ7fAlwP4zKVyStO71CYXXV9UdwEtoDvkcDBzbZ+FJ5iW5HLiN5jGe/wb8pKruaye5iea8hSRpFugTChPnEn4H+FRVXTVp2LSq6v6qehrNeYm9gV/rW1iSw5MsS7JsxYoVfWeTJK2FPqGwPMlXaULhnCRbAg+syUqq6ifA+TRNZGyTZOJcxk7AzauY5+SqWlxVi+fPn78mq5MkzVCfUDgMOArYq6p+BmwKvG51MyWZ394JTZLNgRcD36MJhwPayQ4BzpxB3ZKkAfR5nsIDSX4I7DHpF34fC2muXppHEz6nV9VZSa4GTk3yfuAymmdAS5JmgdV+ySc5jqbp7KuB+9vBRXMPwipV1XeBp69k+PU05xckSbNMn1/++wO7t+0fSZLmsD7nFK6nucdAkjTH9dlT+BlweZLzaO5SBqCq/sdgVUmSxqJPKCxtX5KkOa7P1UdLRlGIJGn8pms6+/SqOjDJFTRXG/2KqnrKoJVJkkZuuj2FI9r3l42iEEnS+E3XdPYt7fuNoytHkjROfS5JlSRtIAwFSVJnlaHQ3pcw0cyFJGkDMN2J5oVJng28IsmpTHmGQlVdOmhlkqSRmy4U3gO8m+aZBx+eMm7isZqSpDlkuquPzgDOSPLuqjpmhDVJksakzx3NxyR5BfD8dtAFVXXWsGVJksZhtVcfJfkgzY1sV7evI5J8YOjCJEmj16dBvP2Ap1XVAwBJltA8Me2dQxYmSRq9vvcpbDOpe+shCpEkjV+fPYUPApclOZ/mstTnA0cNWpUkaSz6nGj+xyQXAHu1g95RVbcOWpUkaSz67ClMNI7ng3YkaY6z7SNJUsdQkCR1pg2FJPOSXDOqYiRJ4zVtKFTV/cC1SXYeUT2SpDHqc6L5kcBVSb4D3D0xsKpeMVhVkqSx6BMK7x68CknSrNDnPoULk+wC7FZV/5zk4cC84UuTJI1anwbx3gCcAZzUDtoR+OKQRUmSxqPPJalvAp4D3AFQVT8AHjVkUZKk8egTCvdU1S8nepJsTPPkNUnSHNMnFC5M8k5g8yQvBj4LfGnYsiRJ49AnFI4CVgBXAH8InA28a3UzJXlMkvOTXJ3kqiRHtMO3TXJukh+0749cmw8gSVp3+lx99ED7YJ1v0xw2uraq+hw+ug94a1VdmmRLYHmSc4FDgfOq6tgkR9GEzjtm/AkkSetMn6uP9gP+DTgR+CvguiS/vbr5quqWqrq07b4T+B7NlUuvBJa0ky0B9p9Z6ZKkda3PzWsnAC+oqusAkjwO+DLwlb4rSbIIeDrN3saCtilugFuBBWtQryRpQH3OKdw5EQit64E7+64gyRbA54A3V9Udk8e1h6FWeigqyeFJliVZtmLFir6rkySthVXuKSR5Vdu5LMnZwOk0X+CvAS7ps/Akm9AEwqer6vPt4B8mWVhVtyRZCNy2snmr6mTgZIDFixd7CawkjcB0h49ePqn7h8Bvtt0rgM1Xt+AkAT4BfK+qPjxp1FLgEODY9v3MNSlYkjScVYZCVb1uLZf9HOBg4Iokl7fD3kkTBqcnOQy4EThwLdcjSVpHVnuiOcmuwJ8AiyZPv7qms6vqX4CsYvQL+5coSRqVPlcffZHmMNCXgAeGLUeSNE59QuEXVXXi4JVIksauTyh8JMl7ga8C90wMnLgxTZI0d/QJhSf
2016-03-15 12:55:14 +00:00
"text/plain": [
2018-02-27 15:14:18 +00:00
"<matplotlib.figure.Figure at 0x7fd9e04f14e0>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot histogram, the default is 10 bins\n",
2018-02-27 15:14:18 +00:00
"plt.hist(iris.target)\n",
"plt.ylabel('Number of instances')\n",
"plt.xlabel('iris class')\n",
"plt.xticks(range(len(iris.target_names)), iris.target_names);"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2017-12-11 17:12:06 +00:00
"As can be seen, we have the same distribution of samples for every class.\n",
2016-03-28 10:26:20 +00:00
"The next step is to see the distribution of the features"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 4,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
]
}
],
"source": [
2016-03-28 10:26:20 +00:00
"# This is a reminder of the name and index of each feature\n",
2016-03-15 12:55:14 +00:00
"print(iris.feature_names)"
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 5,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
2016-03-28 10:26:20 +00:00
"# A reminder of feature names and indexes\n",
2016-03-15 12:55:14 +00:00
"print(iris.target_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2016-03-28 10:26:20 +00:00
"A [**scatter plot**](https://en.wikipedia.org/wiki/Scatter_plot) (*gráfico de dispersión*) displays the value of typically two variables for a set of data."
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 6,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"data": {
"text/plain": [
2018-02-27 15:14:18 +00:00
"Text(0,0.5,'iris class')"
2016-03-15 12:55:14 +00:00
]
},
2018-02-27 15:14:18 +00:00
"execution_count": 6,
2016-03-15 12:55:14 +00:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2018-02-27 15:14:18 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaYAAAEKCAYAAABZr/GWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHLNJREFUeJzt3X2UFfWd5/H3xwa0RZQYkRGUkChhj7FVsFckOAmOEpzoKlETZTWJSUaTOXmaccOcdSWJmWCIQ+JMVjNxfZpkRsdxJco4mpWQB4wxioGgoI74MD4g+EBGUVSigN/9o+q2t2/f7qqmb/f9QX9e53Do+tXv/upbv759P133VlcpIjAzM0vFLs0uwMzMrJqDyczMkuJgMjOzpDiYzMwsKQ4mMzNLioPJzMyS4mAyM7OkOJjMzCwpDiYzM0vKkGYXsCPaZ599Yvz48c0uw8xsh7JixYrfR8Soon4Opu0wfvx4li9f3uwyzMx2KJKeKtPPb+WZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlpSmB5OkMZIWbsfjrpJ0cEGfz0n6xPZXZ2ZmA63pf8cUEeuB02rbJQ2JiK09PO7PSox9eR/LszoWrVzHgsVrWL9xM2NGtjJn5kRmTRrb7LL6RZl9Lepz5pV3c9fjL3YsTztwb647Z2qvxpi7aDXXL1vLtghaJGZPOYB5s9pKry/T59Cv384rb2zrWN5z1xZWfeP4TmNMuWgJz296s2N59IhhLLtgRsfyjEuW8ugLr3UsT9h3OEvOm15/crtRtI0y+1pmzntS5vtetI1GPHcaUWsjvicD/TOviOi3wbtsTPo2sDYivp8vXwi8CpwdEYdIOhs4BdgDaAGOAS4D/gRYC2wBromIhZKWAl+JiOWSXgW+B5wIbAZOjojnK+NHxHckHQRcDowCtgEfBZ4H/hV4BzAUmBsR/1q0H+3t7TFY/8B20cp1nH/TajZvefsFrHVoC/NPadvpwqnMvhb1qX3xqqh+ESsaY+6i1Vx7z9NdxjjrqHHMm9VWuB4o7FMbShXV4VQbGBWV4Kh9AazozQth0TbK7GuZOe9Jme970TYa8dwpo2iMRnxPGvkzL2lFRLQX9Rvot/JuAD5WtfwxYFlNn8nAaRHxQbKQGg8cDHwc6O5ZNRy4JyIOA34FnFOnz3XA9/M+7weeBf4AfCQiJpOF4HclaTv2a9BYsHhNpycowOYt21iweE2TKuo/Zfa1qE+9F6/a9qIxrl+2tu4Ylfai9WX61Aul2vZ6gVHdXu8FsKf2nsbqrr3MvpaZ856U+b4XbaMRz51G1NqI70kzfuYH9K28iFgpaV9JY8iOXF4iOxKqtiQiKt/1o4EbI+It4DlJv+xm6DeBW/OvVwAzqldKGgGMjYib8zr+kLcPBb4l6QPAW8BYYDTwXO0GJJ0LnAswbty48ju9k1m/cXOv2ndkZfa1EfNRNMa2bt7VqLQXrS/bZ0cwEPsxEN/TgdxOXzXjZ74ZJz/cSPaZ0ulkR1C1ykf527bE2+9JbqN84J5JFpBHRMThZG/t7VavY0RcERHtEdE+alThNQh3WmNGtvaqfUdWZl8bMR9FY7R0cxBfaS9aX7bPjmAg9mMgvqcDuZ2+asbPfDOC6QbgDLJwurGg713AqZJ2kTQamL49G4yITcAzkmYBSNpV0u7AXsALEbFF0jHAu7Zn/MFkzsyJtA5t6dTWOrSFOTMnNqmi/lNmX4v6TDtw77pjV7cXjTF7ygF1x6i0F60v02fPXVvqrq9uHz1iWN0+lfYJ+w6vu7679p7G6q69zL6WmfOelPm+F22jEc+dRtTaiO9JM37mBzyYIuJBYASwLiKeLej+Y+AZ4CHgWuB3wMvbuemPA1+StAr4DfBHZJ87tUtaDXwCeHg7xx40Zk0ay/xT2hg7shUBY0e27pQnPkC5fS3qc905U7u8iNV+CF80xrxZbZx11LhOR0jVH/YXrS/TZ9U3ju8STrVn5S27YEaX4Kg+Y27JedO7vOD19gywom2U2dcyc96TMt/3om004rnTiFob8T1pxs/8gJ6Vtz0k7RERr0p6J3AvMC0iunwGNJAG81l5Zmbbq+xZeU3/O6YSbpU0EhgGfLPZoWRmZv0r+WCKiOnNrsHMzAZO0y9JZGZmVs3BZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlhQHk5mZJcXBZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlhQHk5mZJcXBZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlhQHk5mZJcXBZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlhQHk5mZJcXBZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZklxMJmZWVIcTGZmlhQHk5mZJcXBZGZmSXEwmZlZUhxMZmaWFAeTmZklxcFkZmZJcTCZmVlSHExmZpYUB5OZmSXFwWRmZkkpDCZJH5U0Iv96rqSbJE3u/9LMzGwwKnPE9NWI2CTpaOA44GrgB/1blpmZDVZlgmlb/v8JwBURcRswrP9KMjOzwaxMMK2T9H+A04GfSNq15OPMzMx6rUzAfAxYDMyMiI3A3sCcfq3KzMwGrSEl+uwH3BYRb0iaDhwK/GO/VmVmZoNWmSOmHwPbJB0EXAEcAPxzv1ZlZmaDVplgeisitgKnAJdGxByyoygzM7OGKxNMWyTNBj4B3Jq3De2/kszMbDArE0yfAqYCF0XEE5LeDfxT/5ZlZmaDVeHJDxHxEPClquUngIv7sygzMxu8CoNJ0gRgPnAwsFulPSLe0491mZnZIFXmrbx/ILsE0VbgGLJTxa/tz6LMzGzwKhNMrRHxc0AR8VREXEh2eSIzM7OGK/MHtm9I2gV4VNIXgHXAHv1blpmZDVZljpi+DOxOdgLEEcDHgU/2Z1FmZjZ4lTkr77f5l6+SnTreFJL+GvhVRPysl4+bDnwlIk7sl8LMzKyhug0mSf8GRHfrI+KkRhcjSWSfZb1VZ3tfa/T2uqlhSH6lC+vGopXrWLB4Des3bmbMyFbmzJzIrEljO/WZu2g11y9by7YIWiRmTzmAebPaejVGmT59rXPKRUt4ftObHcujRwxj2QUzSq8v0+fMK+/mrsdf7FieduDeXHfO1E5jFM3XjEuW8ugLr3UsT9h3OEvOm156fZltNKLOojkvenyZOhrx3Onrc8v6lyLqZ4+kD/b0wIi4o9tBpW8DayPi+/nyhWRHXCK7WvmuwM0R8XVJ48muXr6M7K3CDwPfANrJgvGaiPhbST8Ebo2IhZL+K/A9YDjwBnAssIXs7MF2sjMIz4uIX1YfMUnaG7gGeA/wOnBuRKzK6zswb386Imb3tO/t7e2xfPnynrrstBatXMf5N61m85ZtHW2tQ1uYf0pbxw/23EWrufaep7s89qyjxjFvVlupMcr06WudtYFSUQmWovVlxqh9ka2ofrEtmq/a0KmohE/R+jLbaESdRXNe9HjoGkq1dTTiudPX55ZtP0krIqK9qF+3nzFFxB15+CwH7qxa/jXw2+4el7uBLIAqPgZsACYARwKHA0dI+kC+fgLw9xHxPmAfYGxEHBIRbWSnq1fv2LB8/C9HxGFkd9XdDHw+KzvagNnAjyTtRmffAFZGxKHA/6LzVdIPBo4rCqXBbsHiNZ1+oAE2b9nGgsVrOpavX7a27mMr7WXGKNOnr3XWC5Tq9qL1ZfrUe5GtbS+ar3qhU91etL7MNhpRZ9GcFz2+TB2NeO709bll/a/MyQ8/Jzv5oaIV6PFznohYCewraYykw4CXgDbgQ8BK4HfAfyELJICnIuKe/Ov/AN4j6VJJxwOv1Aw/EXi
2016-03-15 12:55:14 +00:00
"text/plain": [
2018-02-27 15:14:18 +00:00
"<matplotlib.figure.Figure at 0x7fd9e04f1470>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# scatter makes a plot of x vs y\n",
"plt.scatter(iris.data[:,0], iris.target)\n",
2018-02-27 15:14:18 +00:00
"plt.yticks(range(len(iris.target_names)), iris.target_names);\n",
"plt.xlabel(iris.feature_names[0])\n",
"plt.ylabel('iris class')"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 7,
2016-03-15 12:55:14 +00:00
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
2018-02-27 15:14:18 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt0VeWd//H3NyGaqAhemBIBBTojjhI0GBWl0zqgxQqlqGTQ5Q219VaVTh1ntEXNOPzGztilQ3/j0vHSYpWhQBRGYdTWC9OfVrHhGhXxAigEUIQhggU15Pn9cZKQHHLY++Ts7LP3Pp/XWixz9tl5zvdsw5eTvT/7ecw5h4iIJEtRvgsQEZHgqbmLiCSQmruISAKpuYuIJJCau4hIAqm5i4gkkJq7iEgC+W7uZlZsZsvMbEEnz002sy1mtrzlz/eDLVNERLLRI4t9pwCrgEMzPD/bOXdD7iWJiEiufDV3M+sPjAX+D/DjIF74yCOPdAMHDgxiKBGRgrFkyZJPnXN9vPbz+8n934C/B3ruZ58LzOybwLvA3zrn1u9vwIEDB1JXV+fz5UVEBMDMPvSzn+c5dzMbB3zinFuyn92eAQY654YBvwMeyzDW1WZWZ2Z1W7Zs8VOfiIh0gZ8LqiOB8Wa2DvgNMMrMnmi/g3Nuq3Pui5aHjwAndzaQc+4h51yVc66qTx/P3ypERKSLPJu7c+4251x/59xA4ELgJefcJe33MbPydg/Hk7rwKiIieZJNWqYDM7sLqHPOPQ3cZGbjgSZgGzA5mPJEJC6++uorNmzYwO7du/NdSiKUlpbSv39/SkpKuvT9lq/53KuqqpwuqIokx9q1a+nZsydHHHEEZpbvcmLNOcfWrVvZsWMHgwYN6vCcmS1xzlV5jdHlT+4iQZq/rIF7nl/Nxu27OKp3GbeMGcKEyn75LkuysHv3bgYOHKjGHgAz44gjjiCX4Imau+Td/GUN3PZUPbu+2gNAw/Zd3PZUPYAafMyosQcn12OpuWUk7+55fnVbY2+166s93PP86jxVJBJ/au6Sdxu378pqu0gQZsyYwcaNG/NdRrdRc5e8O6p3WVbbJRnmL2tg5M9eYtCtCxn5s5eYv6wh1NdXcxfpZreMGUJZSXGHbWUlxdwyZkieKpLu1nqdpWH7Lhx7r7Pk2uA///xzxo4dy4knnsjQoUOZPXs2S5Ys4Vvf+hYnn3wyY8aMYdOmTdTW1lJXV8fFF1/MSSedxK5du3jxxReprKykoqKCK6+8ki++SN2Xeeutt3L88cczbNgw/u7v/g6AZ555htNOO43KykrOOussPv7441wPSeDU3CXvJlT24+7zK+jXuwwD+vUu4+7zK3QxNcG66zrLc889x1FHHcWKFSt48803Oeecc7jxxhupra1lyZIlXHnllfz0pz9l4sSJVFVVMXPmTJYvX46ZMXnyZGbPnk19fT1NTU088MADbN26lXnz5vHWW2+xcuVKpk6dCsA3vvENXn/9dZYtW8aFF17Iv/7rv+ZUd3dQWkYiYUJlPzXzAtJd11kqKiq4+eab+Yd/+AfGjRvHYYcdxptvvsnZZ58NwJ49eygvL9/n+1avXs2gQYM49thjAbj88su5//77ueGGGygtLeWqq65i3LhxjBs3DoANGzYwadIkNm3axJdffrlPFj0K9MldRELXXddZjj32WJYuXUpFRQVTp07lySef5IQTTmD58uUsX76c+vp6fvvb3/oer0ePHrzxxhtMnDiRBQsWcM455wBw4403csMNN1BfX89//Md/RPKuXDV3EQldd11n2bhxIwcddBCXXHIJt9xyC4sXL2bLli289tprQGqKhLfeeguAnj17smPHDgCGDBnCunXreP/99wF4/PHH+da3vsXOnTtpbGzk3HPP5b777mPFihUANDY20q9f6jfNxx7rdBLcvNNpGREJXespuKDvSq6vr+eWW26hqKiIkpISHnjgAXr06MFNN91EY2MjTU1N/OhHP+KEE05g8uTJXHvttZSVlfHaa6/xq1/9iurqapqamjjllFO49tpr2bZtG9/73vfYvXs3zjnuvfdeAGpqaqiuruawww5j1KhRrF27NudjEjTNLSMigVi1ahV/+Zd/me8yEqWzY+p3bhmdlhERSSA1dxGRBFJzFxFJIDV3EZEEUnMXEUkgRSElZ1poQyR69MldctJdE0CJRMUdd9zBCy+8kPX3LVq0qG26gnzQJ3fJyf4mgNKnd9mvlXPgxbugcQP06g+j74Bhf5OXUpxzOOcoKtr38+5dd90VSg1NTU306BFcS9Ynd8mJFtqQLlk5B565CRrXAy7132duSm3Pwa233sr999/f9rimpoaf//zn3HPPPZxyyikMGzaMO++8E4B169YxZMgQLrvsMoYOHcr69euZPHkyQ4cOpaKigvvuuw+AyZMnU1tbC8Af//hHzjjjDE488UROPfVUduzYwe7du7niiiuoqKigsrKSl19+eZ+6tm3bxoQJExg2bBgjRoxg5cqVbfVdeumljBw5kksvvTSn955OzV1yooU2pEtevAu+SvsA8NWu1PYcTJo0iTlz9v4DMWfOHPr06cN7773HG2+8wfLly1myZAm///3vAXjvvfe4/vrreeutt/j0009paGjgzTffpL6+niuuuKLD2F9++SWTJk1i+vTprFixghdeeIGysjLuv/9+zIz6+npmzZrF5Zdfvs9EYnfeeSeVlZWsXLmSf/7nf+ayyy5re+7tt9/mhRdeYNasWTm993Rq7pITLbQhXdK4IbvtPlVWVvLJJ5+wceNGVqxYwWGHHdY2E2RlZSXDhw/nnXfe4b333gPgmGOOYcSIEQAMHjyYNWvWcOONN/Lcc89x6KGHdhh79erVlJeXc8oppwBw6KGH0qNHD1555RUuueQSAI477jiOOeYY3n333Q7f+8orr7R9Mh81ahRbt27ls88+A2D8+PGUlQX/YUjn3CUn3TUBlCRcr/4tp2Q62Z6j6upqamtr2bx5M5MmTeLDDz/ktttu45prrumw37p16zj44IPbHh922GGsWLGC559/ngcffJA5c+bwy1/+Mud6vLSvIUj65C45m1DZj1dvHcXan43l1VtHqbGLt9F3QEnap9WSstT2HE2aNInf/OY31NbWUl1dzZgxY/jlL3/Jzp07AWhoaOCTTz7Z5/s+/fRTmpubueCCC5g2bRpLly7t8PyQIUPYtGkTf/zjHwHYsWMHTU1N/NVf/RUzZ84E4N133+Wjjz5iyJCOv7m232fRokUceeSR+/xmEDR9ck84ZdAlklpTMd2QljnhhBPYsWMH/fr1o7y8nPLyclatWsXpp58OwCGHHMITTzxBcXHH04kNDQ1cccUVNDc3A3D33Xd3eP6AAw5g9uzZ3HjjjezatYuysjJeeOEFrr/+eq677joqKiro0aMHM2bM4MADD+zwvTU1NVx55ZUMGzaMgw46KJQ54DXlb4K1ZtDbRxXLSoq1Pql0C035GzxN+Sud6q5FiEUk+tTcE0wZdJHCpeaeYMqgixQuNfcEUwZdpHApLZNgyqCLFC4194SbUNlPzVykAPk+LWNmxWa2zMwWdPLcgWY228zeN7PFZjYwyCJF5i9rYOTPXmLQrQsZ+bOXNKWw+LZx40YmTpyY9fd9//vf5+23397vPg8++CC//vWvu1pat8rmk/sUYBXQ2W1VVwH/65z7czO7EPgXYFIA9Ynsk9dvnTMe0G8lMbZwzUKmL53O5s830/fgvkwZPoWxg8cG/jpHHXVU26yO7XlNsfvII494jn3ttdfmVFt38vXJ3cz6A2OBTO/2e0DrLVe1wGgzs9zLE1FeP4kWrllIzR9q2PT5JhyOTZ9vouYPNSxcszCncTNN+Tt06FAAZsyYwfjx4xk1ahSjR4+mubmZ66+/nuOOO46zzz6bc889t+0fgjPPPJPWGy0POeQQfvrTn3LiiScyYsQIPv744w7jA7z//vucddZZnHjiiQwfPpwPPviAnTt3Mnr0aIYPH05FRQX/9V//ldP7y4bf0zL/Bvw90Jzh+X7AegDnXBPQCByRvpO
2016-03-15 12:55:14 +00:00
"text/plain": [
2018-02-27 15:14:18 +00:00
"<matplotlib.figure.Figure at 0x7fd9de43e400>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot the distribution of the dataset\n",
"names = set(iris.target)\n",
"\n",
"# x and y are all the samples from column 0 (sepal_length) and 1 (sepal_width) respectively\n",
"x,y = iris.data[:,0], iris.data[:,1]\n",
"\n",
"for name in names:\n",
" cond = iris.target == name\n",
" plt.plot(x[cond], y[cond], linestyle='none', marker='o', label=iris.target_names[name])\n",
"\n",
"plt.legend(numpoints=1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2016-03-28 10:26:20 +00:00
"As we can see, the Setosa class seems to be linearly separable with these two features.\n",
2016-03-15 12:55:14 +00:00
"\n",
"Another nice visualisation is given below."
]
},
{
"cell_type": "code",
2018-02-27 15:14:18 +00:00
"execution_count": 8,
2016-03-15 12:55:14 +00:00
"metadata": {
2018-02-27 15:14:18 +00:00
"scrolled": true
2016-03-15 12:55:14 +00:00
},
"outputs": [
{
"data": {
2018-02-27 15:14:18 +00:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEKCAYAAAA4t9PUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzsnXecFEX2wL9v8mxmyTkIioCAgKCCImDAhAdiOvXMOaCeWU/MoJ453WHWU38qJsw5B6ISVEQUEBHJsGF28vv90QPs7vTMzi4bWLa+fvrjTFVX1eveoV9X1QuiqhgMBoPBUFMcDS2AwWAwGBo3RpEYDAaDYZswisRgMBgM24RRJAaDwWDYJowiMRgMBsM2YRSJwWAwGLYJo0gMBoOhgRCRdiIytQbtHhWRXlWcc7aI/KPm0lVDHuNHYjAYDNsXIuJS1WhDy5EpZkZiMBgM9YCITBaR88p9v15ELhWRBYnvJ4vINBH5GPhIRBwi8pCILBSRD0TkbREZnzj3UxEZlPhcIiK3iMhcEflWRFqX7z/xubuIfJg4Z46I7CQiOSLyUeL7fBE5oqbX5tqG+5IRIuIEZgErVPWwSnUnA3cAKxJFD6jqo+n6a9GihXbp0qUOJDUYDDsas2fPXquqLbelj4NGZOu69bHMxpsX+gEIliuaoqpTEp9fAO4BHkx8Pxo4Czi53PkDgL6quj6hNLoAvYBWwE/A4zbDZgPfquo1InI7cAZwc6VzngUmq+qrIuLDmkSEgbGqWiQiLYBvRWSa1mCZqs4VCTAB6wbkpah/QVXPz7SzLl26MGvWrFoRzGAw7NiIyLJt7WPd+hgz3uuU0bnOtr8EVXWQXZ2qficirUSkHdAS2AAsr3TaB6q6PvF5GPCSqsaBv0TkkxTDhoE3E59nAweUrxSRXKC9qr6akCOYKHcDt4rIvkAcaA+0Bv7K6GLLUadLWyLSATgUSDvLMBgMhu0VBeIZ/pcBLwHjgWOwZiiVKa2BiJFys4gYmU8QjsdSaANVtT+wCvDVYPw63yO5B7gc0t7hI0VknohMFZGOdieIyJkiMktEZq1Zs6ZOBDUYDAY7FCWisYyODHgBOBZLmbxUxblfYT0fHYl9j/1qJL9qMfCHiPwNQES8IpIF5AOrVTUiIiOAzjXpH+pQkYjIYVhCzk5z2htAF1XtC3wAPGV3kqpOUdVBqjqoZcttWu40GAyGalNbMxJV/QHIxdozXlnF6S8DfwA/Av8D5gCbangJJwIXisg84GugDda+ySARmQ/8A1hYw77rzvxXRCZhCR/Fmi7lAa+o6gkpzncC61U1P12/gwYNUrNHYjAYMkFEZqfas8iU3ft59LN32mR0bn775ds8XnlEJEdVS0SkOTADGKqq1d7DqGvqbLNdVa8CrgIQkf2ASysrERFpW04rj8HalDcYDIbtijgN5m/3pogUAB7gpu1RiUD9WG1VQERuBGap6jSsqdYYrFnLeiqawRkaMct/XkHJxgDd+nbC6/c2tDgGQ41RINZAikRV92uQgatJvSgSVf0U+DTx+bpy5VtmLYYdgxWLVzJx7B38tWQVTpeTeCzO6ZOP54jzDm5o0QyGGtOAM5JGQb3PSAw7LtFIlEuGT2TDXxspv/f2yBXP0rpzK/Y8bGADSmcw1AwFIiaUVFpMiBRDrfHtm7MpKwlS2YAjFAjx7C0vN5BUBsO2oSixDI+mipmRGGqNVUvXEAlFUtStrmdpDIZaQiHWdHVERhhFYqg1uu7WCbfHRTScHLS0W78u9S+QwVALWJ7thnSYpS1DrdF/ZB9ad2mJy+OsUO71e/jH9Uc3kFQGw7YixDI8mipGkRhqDYfDwZ2f3MBehw/C5XHh8rho2601E1++lF577tzQ4hkMNcLabJeMjqaKWdoy1Cp5zXO57qVLCZWFCJWFyW2Wg0jT/QdmaPxYfiTmN5wOo0gMdYLX7zWOiIYdhngTnm1kglEkBoPBkAYzI6kao0gMBoMhDYoQM9vJaTGKxGAwGKrALG2lxygSg8FgSIMihNVZ9YlNGKNIDAaDIQ2WQ6JZ2kqHUSQGg8FQBWazPT1GkRgMBkMaVIWYmhlJOowiMaTk51m/MuWyp1k4/Rey8vwcfs5BHHfVWNwed0OLZjDUK3EzI0mLUSQGWxZ/t4R/7jeRUCAEQDgY4YXbX2fxd0u48bUrGlg6g6H+sDbbzaMyHWa+ZrDlsauf26JENhMuCzPng3ksWfB7A0llMNQ/mzfbMzmaKk33yg1p+Wn6IvsKgZ++SVFnMOygxFQyOpoqZr5msCWvMJfSjYGkcqfTSUGr/AaQyGBoGIxne9WYu2Ow5chLDsOblRx00eVxscfB/RtAIoOh4YirI6OjqWJmJAZbDj/7QH77fikf/O9zXG4nguDxuZn07rXGasvQpLCCNjZdJZEJRpEYbHE4HFw85Wz+fs2R/PjNIvKa59B/RB+cLhMqwtC0UISICZGSFqNIdiAWf7eEZ29+mcXfL6Fd9zb8/apx9Nuv9zb12bpzS1p3bllLEhoMjQ9VjENiFRhFsoPw3cfz+deYyYTLIqgqfy1ZzQ9f/cwFD5zGQSePaGjxDIZGjBiHxCowanYHQFW55+wphAJhVHVLeSgQ4qGLniASjjSgdAZD40axZiSZHE2VpnvlOxDF60tY/fta+0qF3+YZB0KDYVuI4cjoaKqYpa0dALfPbS3k2hCLxcjK9dWzRAbDjoMiJrFVFTRdFboD4c/2sfv+fXG6Kv45RaBVp5Z02LldA0lmMDR+FIioK6OjqWIUyQ7CpY+dQ8uOLfDn+kDAn+Mjt3kuE1++FBHzNmUw1BwhluHRVGm6KnQHo7BNM578+T6+fXM2S+b/TtturRk2bjBef7J3usFgyByFJu21nglGkexAOF1Ohv5tMEP/NrihRbElGony+Uvf8PHzX+L2ujno5BEMOXSAmTEZtnua8mwjE+pckYiIE5gFrFDVwyrVeYGngYHAOuAYVV1a1zIZ6p9IOMIVB9zEL3N+I1hqhaef9d73DB07hCueOt8oE8N2i6qYGUkV1MfdmQD8lKLuNGCDqnYH7gZuqwd5DA3Ax899WUGJAARLQ3z16nTmff5jA0pmMKTH2mx3ZnQ0VepUkYhIB+BQ4NEUpxwBPJX4PBUYJebVdIfkg2c+q6BENhMKhPjspW8aQCKDIVPEOCRWQV1f+T3A5UA8RX17YDmAqkaBTUDzyieJyJkiMktEZq1Zs6auZDXUIQ5Hqp+a4HCYdwfD9ou12S4ZHU2VOlMkInIYsFpVZ29rX6o6RVUHqeqgli1NAMHGyIEn7YcvO9mCzOP3MOLYYQ0gkcGQOcazPT11eeVDgTEishT4P2CkiPyv0jkrgI4AIuIC8rE23Q07GCOOHUqfYT3x5VjKRAR82V72P3Ffeu21cwNLZzCkZrNnu5mRpKbOrLZU9SrgKgAR2Q+4VFVPqHTaNOAk4BtgPPCxaopYH4ZGjdPl5OY3r2L6W3P47MWvcXvdHHDicPoO72UstgzbPfEmPNvIhHr3IxGRG4FZqjoNeAx4RkQWA+uBY+tbHgN8PvUbXr3/bTSmHH7uQYz6+z51Mo7T6WTvMXuw95g96qR/g6EuUIVI3CiSdNSLIlHVT4FPE5+vK1ceBI6qDxkM9py/51X8PGPxlu8/fP0zz096lSlz/51mg9xgaDpYS1vm30I6zN1pwkx7+L0KSmQzy35Yzv/d9loDSGQwbJ+YWFvpMYqkCfPqvW+lrHvzP+/XoyQGw/aLMf+tGhNrqwkTCqbOnBhOU2cwNC3M0lZVmLvThNl3/J4p67bXwI8GQ0MQT+Rtr+poqhhF0oQ59ebjyMrzJ5V7szyceceJDSCRwbD9YVltOTM6mipGkTRhPD4Pz//xX/Y7Zm982V68WR6GjR3Mc7//h+y8rIYWz2DYLjAOiVVj9kgaIZ+99DWv3vc22XlZnP/AabTt2rrGfWXl+Lnm+YtrUToIlYWY/tYcSjcF6Du8F+27t63V/g2NF40uhfAMcOSDdz+sTBJVtNEQhD6F+Cbw7IG4uta5nJVpystWmWAUSSMiFot
2016-03-15 12:55:14 +00:00
"text/plain": [
2018-02-27 15:14:18 +00:00
"<matplotlib.figure.Figure at 0x7fd9de38a278>"
2016-03-15 12:55:14 +00:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x_index = 0\n",
"y_index = 1\n",
"formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])\n",
"plt.scatter(iris.data[:, x_index], iris.data[:, y_index], s=40,\n",
"c=iris.target)\n",
"plt.colorbar(ticks=[0, 1, 2], format=formatter)\n",
"plt.xlabel(iris.feature_names[x_index])\n",
2018-02-27 15:14:18 +00:00
"plt.ylabel(iris.feature_names[y_index]);"
2016-03-15 12:55:14 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2016-03-28 10:26:20 +00:00
"This alternate visualisation also suggests that the Setosa class seems to be linearly separable.\n",
2016-03-15 12:55:14 +00:00
"\n",
"Students interested in practicing advanced visualisations can check [Advanced visualisation notebook](2_3_1_Advanced_Visualisation.ipynb).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](http://proquest.safaribooksonline.com/book/programming/python/9781783981960), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
"* [Iris dataset visualisation notebook](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook)\n",
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2018-02-27 15:14:18 +00:00
"version": "3.6.4"
2016-03-15 12:55:14 +00:00
}
},
"nbformat": 4,
2018-02-27 15:14:18 +00:00
"nbformat_minor": 1
2016-03-15 12:55:14 +00:00
}