You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml1/2_3_0_Visualisation.ipynb

387 lines
81 KiB
Plaintext

8 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Visualisation](#Visualisation)\n",
"* [Exploratory visualisation](#Exploratory-visualisation)\n",
"* [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to analyse a dataset. We will cover other tasks such as cleaning or munging (changing the format) the dataset in other sessions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory visualisation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"This section covers different ways to inspect the distribution of samples per feature.\n",
"\n",
"First of all, let's take a see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
"\n",
"A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). \n",
"\n",
"For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
"\n",
"In our case, since the values are not continuous and we have only three values, we do not need to bin them."
8 years ago
]
},
{
"cell_type": "code",
"execution_count": 1,
8 years ago
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 6,
8 years ago
"metadata": {
"collapsed": false
},
8 years ago
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/lib/python3/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.\n",
" warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')\n",
"/usr/lib/python3/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.\n",
" warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')\n"
]
}
],
8 years ago
"source": [
"# library for displaying plots\n",
"import matplotlib.pyplot as plt\n",
"# display plots in the notebook\n",
"# if this is not set, you will not see the graphic here\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 7,
8 years ago
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7f04f62a1a90>"
8 years ago
]
},
8 years ago
"execution_count": 7,
8 years ago
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEPCAYAAABCyrPIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE7NJREFUeJzt3X+0J3V93/Hna/kVQH5pKmg2QNZEOPGYgmnQBI6sijWR\nyiZWwERTVEwbj62cQ45HoChY0hJqsJXaetRsm8VUfoQkgEIiJbi0mIDEhUIUERC1ICwm/FrAKLDv\n/vH97O7luvfeubs7M/fuPh/nzLnznZnvzPt+z9z7+s6Pz2dSVUiStGTsAiRJC4OBIEkCDARJUmMg\nSJIAA0GS1BgIkiQAdu57A0m+BTwGrAeerqojkuwHXAIcBHwLOKGqHuu7FknSzIY4QlgPLK+qw6vq\niDbtNODaqjoEuA44fYA6JEmzGCIQspntrABWtfFVwK8OUIckaRZDBEIBX0hyc5J3t2n7V9VagKp6\nEHjhAHVIkmbR+zUE4MiqeiDJPwKuSXInk5CYyv4zJGlkvQdCVT3Qfn4vyeXAEcDaJPtX1dokBwAP\nbe69SQwKSdoCVZX5vqfXU0ZJ9kjyvDa+J/BPgduBK4F3tMVOAq6YeS014vAVli07nKraLoazzjpr\n9Bq2l2H8fXPDwOifxbYYtpd9cyHtF1ui7yOE/YE/a9/0dwb+Z1Vdk+RvgEuTvAv4NnBCz3VIkubQ\nayBU1b3AYZuZ/jBwTJ/bliTNjy2VdyDLly8fuwRps9w3F4ZsOu+18ExONY1Z3xqWLXs399yzZsQa\ntBAlYWHcHBcW8t/wjmaB7RcL66KyJGnxMBAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaC\nJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSGgNB\nkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJwECB\nkGRJkjVJrmyvD05yY5JvJLkoyc5D1CFJmtlQRwinAF+b8vo84PyqeinwKHDyQHVIkmbQeyAkWQq8\nEfiDKZNfC/xJG18F/FrfdUiSZjfEEcJ/At4PFECSFwCPVNX6Nv8+4MUD1CFJmkWv5+6THAusrapb\nkyyfOqv7Ws6eMr68DZKkTVa3Yev0fTH3SOC4JG8Edgf2Aj4G7JNkSTtKWArcP/Mqzu65REla7Jbz\n3C/LH96itfR6yqiqzqiqA6tqGfBW4LqqejvwReD4tthJwBV91iFJmttY7RBOA05N8g3g+cDKkeqQ\nJDWD3f9fVdcD17fxe4FXDrVtSdLcbKksSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmN\ngSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCOgRCkuOT7NXGz0zyp0le0X9p\nkqQhdTlC+GBVrUtyFHAMk+cff6LfsiRJQ+sSCM+2n8cCn6qqq4Bd+ytJkjSGLoFwf5JPAicCVyfZ\nreP7JEmLSJd/7CcAXwDeUFWPAs8H3t9rVZKkwc0ZCFX1FPAQcFSb9AxwV59FSZKG1+Uuo7OADwCn\nt0m7AH/UZ1GSpOF1OWX0a8BxwJMAVfVdYK8+i5IkDa9LIPywqgoogCR79luSJGkMXQLh0naX0b5J\nfgu4Fvh0v2VJkoa281wLVNXvJ3k98DhwCPChqvpfvVcmSRrUnIEA0ALAEJCk7diMgZDkhqo6Ksk6\n2vWDDbOAqqq9e69OkjSYGQOhqo5qP72jSJJ2AF3aIbxqQ2+n7fVeSV7Zb1mSpKF1ucvoE8ATU14/\nib2dStJ2p0sgpLVDAKCq1tPxYrQkafHoEgjfTPK+JLu04RTgm30XJkkaVpdA+G3gl4D7gfuAVwL/\nssvKk+yW5KYktyS5vfWLRJKDk9yY5BtJLkriEYckjaxLw7SHgLduycqr6gdJXlNVTyXZCfhSkr8A\nTgXOr6o/TvIJ4GTgk1uyDUnSttHlLqOXJvnLJH/bXv9ckjO7bqB1nw2wG5MAKuA1wJ+06auYdKAn\nSRpRl1NGn2bS9fXTAFV1G/M4YkiyJMktwINMWjvfAzzaLk7D5DTUi+dTtCRp2+sSCHtU1ZenTXum\n6waqan1VHQ4sBY4ADp1HfZKkgXS5mPt3SV7Cpu6v3wI8MN8NVdXjSVYDv8ik59Ql7ShhKZML1jM4\ne8r48jZIkjZZ3Yat0yUQ3gt8Cjg0yf3AvcDbuqw8yY8DT1fVY0l2B14P/B7wReB44BLgJOCKmddy\ndpdNSdIObDnP/bL84S1aS5e7jL4JHNMejLOkqtbNY/0vAlYlWcLk9NQlVXV1kjuAi5OcA9wCrNyC\n2iVJ29CcgZDkBcBZwFFAJbkB+HdV9fdzvbeqbgdesZnp9zJpzyBJWiC6XFS+GPge8M+Bt7TxS/os\nSpI0vC7XEF5UVedMef27SU7sqyBJ0ji6HCFck+StrT3BkiQnAF/ouzBJ0rAypSPTzS8weWLansCz\nTJ6WtoRJF9jQ85PTktRzH9Y2tDUsW/Zu7rlnzYg1aCFKwrj75gZhrr9hDWeB7ReZ77u63GXkE9Mk\naQfQpS+jI9stpyR5e5KPJjmw/9IkSUPq+sS0p5L8Y+B3mPRF9Jleq5IkDa5LIDzTnpi2Avh4Vf1X\nwNNIkrSd6XLb6bokpwNvB17dWh3v0m9ZkqShdTlCOBH4AXByVT3IpDO6j/RalSRpcF3uMnoQ+OiU\n198BLuyzKEnS8LocIUiSdgAGgiQJmCUQkvxl+3necOVIksYy2zWEFyX5JeC4JBcz6bZio6qyPwdJ\n2o7MFggfAj7I5K6ij06bV8Br+ypKkjS8GQOhqi4DLkvywWndX0uStkNdbjs9J8lxwKvbpNVV9fl+\ny5IkDa1L53bnAqcAX2vDKUn+Q9+FSZKG1aXrimOBw6pqPUCSVcAtwBl9FiZJGlbXdgj7Thnfp49C\nJEnj6nKEcC5wS5IvMrn19NXAab1WJUkaXJeLyhclWQ38Qpv0gda/kSRpO9LlCIGqegC4sudaJEkj\nsi8jSRJgIEiSmlkDIclOSb4+VDGSpPHMGghV9SxwZ5IDB6pHkjSSLheV9wO+muTLwJMbJlbVcb1V\nJUkaXJdA+GDvVUiSRtelHcL1SQ4Cfqaqrk2yB7BT/6VJkobUpXO73wIuAz7ZJv0EcHmfRUmShtfl\nttP3AkcCjwNU1V3AC/ssSpI0vC6B8IOq+uGGF0l2ZvLENEnSdqRLIFyf5Axg9ySvB/4Y+Fy/ZUmS\nhtYlEE4DvgfcDvwr4GrgzD6LkiQNr8tdRuvbQ3FuYnKq6M6q6nTKKMlS4EJgf2A98OmquiDJfsAl\nwEHAt4ATquqxLfsVJEnbQpe7jI4F7gEuAD4O3J3kVzqu/xng1Kp6GfCLwHuTHMrkqOPaqjoEuA44\nfUuKlyRtO10app0PvKaq7gZI8hLgKuDP53pje27Cg238iSR3AEuBFcDRbbFVwGp86I4kjarLNYR1\nG8Kg+Sawbr4bSnIwcBhwI7B/Va2FjaHhbaySNLIZjxCSvLmN/k2Sq4FLmVxDOB64eT4bSfI8Jo3b\nTmlHCtOvQXgbqySNbLZTRm+aMr6WTad4vgfs3nUDrd3CZcBnquqKDetLsn9VrU1yAPDQzGs4e8r4\n8jZIkjZZ3Yatk443DG35BpILgb+rqlOnTDsPeLiqzkvyAWC/qvqRawiTI4kxDx7WsGzZu7nnnjUj\n1qCFKAkL48A29P03rO4W2H6R+b5rzovKSX4K+DfAwVOX79L9dZIjgbcBtye5hckndQZwHnBpkncB\n3wZOmG/hkqRtq8tdRpcDK5m0Tl4/n5VX1ZeYuWfUY+azLklSv7oEwj9U1QW9VyJJGlWXQPhYkrOA\na4AfbJhYVZ5Yl6TtSJdAeDnwm8Br2XTKqNprSdJ2oksgHA8sm9oFtiRp+9OlpfLfAvv2XYgkaVxd\njhD2Bb6e5Gaeew1hztt
8 years ago
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7f04f626b710>"
8 years ago
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot histogram, the default is 10 bins\n",
"plt.hist(iris.target, bins=10)\n",
"plt.xlabel('Number of species')\n",
"plt.ylabel('iris class')\n"
8 years ago
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"We see we have the same distribution of samples for every class.\n",
"The next step is to see the distribution of the features"
8 years ago
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 8,
8 years ago
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
]
}
],
"source": [
8 years ago
"# This is a reminder of the name and index of each feature\n",
8 years ago
"print(iris.feature_names)"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 9,
8 years ago
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
8 years ago
"# A reminder of feature names and indexes\n",
8 years ago
"print(iris.target_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"A [**scatter plot**](https://en.wikipedia.org/wiki/Scatter_plot) (*gráfico de dispersión*) displays the value of typically two variables for a set of data."
8 years ago
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 10,
8 years ago
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7f04f62a1630>"
8 years ago
]
},
8 years ago
"execution_count": 10,
8 years ago
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEPCAYAAABsj5JaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHpFJREFUeJzt3X+UXWV97/H3B8LoCCuB1BSVxAQUMClJQ+hFJCJjlUg0\nJrf+Qsut1obCUnND9a6uq17zQ3vv6qJa6w8IyhVJ7CTEyMXIDKKj4qw6UZwIBFJnwNgQf4AyaVOs\nFLRAv/ePszM5M3PO3uc8M+ecmeTzWuus7LOf59n7u59zcr6zn+fsfRQRmJmZ1eu4VgdgZmZTkxOI\nmZklcQIxM7MkTiBmZpbECcTMzJI4gZiZWZKWJhBJsyXdKemHkvZKWluhzsWSHpN0T/b4UCtiNTOz\nkaa1eP9PA++LiD2STgLultQTEQ+MqvcPEbGyBfGZmVkVLT0DiYhfRsSebPlxYBA4rUJVNTUwMzMr\nNGnmQCTNAxYD369QfIGkeyXdLmlBUwMzM7OKWj2EBUA2fHULcHV2JlLubmBuRDwhaTmwEzir2TGa\nmdlIavW9sCRNA7qBOyLikzXUfwg4LyIOjVrvm3qZmSWIiKRpgskwhPV5YKBa8pB0atny+ZSS3qFK\ndSNi0j02bNjQ8hgck2M6FuNyTLU9xqOlQ1iSlgKXA3sl3QsE8EFgLhARcQPwJknvAp4CngQua1W8\nZmZ2REsTSETsAo4vqHMdcF1zIjIzs1pNhiGso1pHR0erQxjDMdXGMdVuMsblmBqv5ZPoE0VSHC3H\nYmbWLJKIKTyJbmZmU5ATiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzM\nLIkTiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkTiJmZJXECMTOz\nJE4gZmaWxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkTiJmZJWlpApE0W9Kdkn4oaa+ktVXq\nfUrSPkl7JC1udpxmZjbWtBbv/2ngfRGxR9JJwN2SeiLigcMVJC0HXhQRZ0p6KfAZ4IIWxWtmZpmW\nnoFExC8jYk+2/DgwCJw2qtoq4AtZne8DMySd2tRAp5CDBw+ye/duDh482OpQJkTR8eSV79q1iw0b\nNrBr16662w4ODrJlyxYGBwcrts0rL2p7/fXX84pXvILrr79+TNnWrVtZtWoVW7duHVPW3d3NFVdc\nQXd3d8Xt5snbblHMRf2Yp+j1y9t2XtvxvM+L2o6nn4+2/3+FImJSPIB5wAHgpFHru4ALy55/E1hS\noX0c67Zt2x7t7TNjxowl0d4+M7Zt297qkMal6Hjyyi+5ZHlAe8CZAe2xbNnymtuuWXN11vasgPZY\ns2btiLZ55UVtTznl1BFxzZw5a7hs9uzTR5TNmTNvuOycc84dUbZw4eKa+zFvu0UxF/VjnqLXL2/b\neW3H8z4vajuefp6q//+yz860z+3UhhP5AE4CfgCsqlDmBFKDoaGhaG+fGXBfQATcF+3tM2NoaKjV\noSUpOp688r6+vuxD4EgZtEdfX19h24GBgYptBwYGIiJyy4vabtq0qWL5pk2borOzs2JZZ2dndHV1\nVSzr6uoq7Me87RYdT1E/juf1y9t2XtvxvM+L2o6nn6fy/7/xJJBWz4EgaRpwC/D3EfGVClUeBuaU\nPZ+drRtj48aNw8sdHR10dHRMWJyT3YEDB2hrm8eTTy7K1izihBPmcuDAAWbNmtXS2FIUHU9eeU9P\nD6W3yZEyOI2enh6WLl2a23ZgYIDS26287Wz6+/uZP38+/f39VctLqre9+eabK8Z18803c8opp1Qs\n27FjR/b6jS3buXMnK1asyO3HHTt2VN3u5Zdfnns8+/fvz+3HPEWvX95r1NbWVrUtkPw+L4pp586d\nFWOqpZ+n0v+/3t5eent7J2ZjqZlnoh6U5jc+nlP+WuD2bPkC4K4q9SYmHU9RU/kvoEp8BuIzEJ+B\nNAdTdQgLWAo8A+wB7gXuAS4FrgKuLKt3LfBj4D4qDF+FE0hEHBmDnT793Ck1BltN0fHklS9bdnh8\n/cWRNwdSqe2aNWujfBx87BxI9fKitjNnzhoRV/kcyJw580aUlc9VLFy4eERZPWPzedstirmoH/MU\nvX55285rO573eVHb8fTzVP3/N2UTyEQ+nEBKhoaGor+/f0r85VOLouPJK+/r64v169dX/Ys5r+3A\nwEBs3rx5+OyhnvKitps2bYqLLrooNm3aNKass7MzVq5cOXyGUK6rqytWr15d01/E9Wy3KOaifsxT\n9PrlbTuv7Xje50Vtx9PPU/H/33gSiErtpz5JcbQci5lZs0giIpTS1rcyMTOzJE4gZmaWxAnEzMyS\nOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkTiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkSJxAzM0vi\nBGJmZkmcQMzMLIkTiJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVkSJxAzM0viBGJmZkmcQMzMLIkT\niJmZJXECMTOzJE4gZmaWxAnEzMySOIGYmVmS3AQi6WWSrpN0v6SDkn4q6auS3iNpxkQEIOlGSY9K\nur9K+cWSHpN0T/b40ETs18zMxkcRUblAugN4BPgK8ANgCHg2cBbwSuD1wMcj4rZxBSC9HHgc+EJE\nLKpQfjHwPyJiZcF2otqxmJlZZZKICKW0nZZT9icR8c+j1j0O3JM9/lbSc1N2Wi4i+iTNLaiWdHBm\nZtY4VYewRicPSdMlzTz8qFSngS6QdK+k2yUtaNI+zcwsR94ZCACSrgI+DPwGODxGFMAZDYyr3N3A\n3Ih4QtJyYCelYbQxNm7cOLzc0dFBR0dHM+IzM5syent76e3tnZBtVZ0DGa4g7QNe1sizjWwIq6vS\nHEiFug8B50XEoVHrPQdiZlan8cyB1PI13n8CnkjZeB1ElXkOSaeWLZ9PKekdqlTXzMyap3AIC/gA\n8F1J3wd+e3hlRKydiAAkbQM6gN+R9FNgA9BW2kXcALxJ0ruAp4AngcsmYr9mZjY+tQxh9QN9wF7g\nPw+vj4gtjQ2tPh7CMjOr33iGsGpJIPdGxLlJkTWRE4iZWf0aPQdyh6QrJT1/9Nd4zczs2FXLGchD\nFVZHRDTra7w18RmImVn9GjqENVU4gZiZ1a+hQ1jZjRNPLnt+iqR3p+zMzMyOHrUMYe2JiMWj1k26\niXWfgZiZ1a/Rk+jHSxreuKTjKV2nYWZmx7BaLiT8GvBFSZ/Nnl+VrTMzs2NYLUNYxwFXAq/OVn0D\n+FxEPNPg2OriISwzs/r5W1g4gZiZpWjIHIikLkmvl3RChbIzJH1E0p+l7NTMzKa+vJ+0fR7wPuCN\nwCHgIKWftD0d+DFwbUR8pUlxFvIZiJlZ/Ro+hCVpHvB8SnfD/VFENPr27nVzAjEzq5/nQHACMTNL\n0ejrQMzMzMZwAjEzsyROIGZmlqTwSnRJS4GNwNysvpiEt3M3M7PmquVK9AeA9wJ3A8NXn0fEvzQ2\ntPp4Et3MrH7jmUSv5V5Yv4qIO1I2bmZmR6+8CwmXZItvAY4HbgV+e7g8Iu5peHR18BmImVn9GnId\niKRv57SLiPjDlB02ihOImVn9GnohoaQzImJ/0bpWcwIxM6tfoy8kvKXCui+l7MzMzI4eVSfRJb0E\n+D1ghqQ3lBVNp3RTRTMzO4blfQvrbGAFcDLw+rL1vwb+vJFBmZnZ5FfLHMjLIuJ7TYonmedAzMzq\n1+hJ9E8Doyv9CviBfw/EzGxqa/Qk+rOAxcC+7LEImA2slvSJlJ2Wk3SjpEcl3Z9T51OS9knaI2nx\nePdpZmbjV0sCWQS8MiI+HRGfBl4NvAT4I2DZBMRwE/CaaoWSlgMviogzgauAz0zAPo9aBw8eZPfu\n3Rw8eLBi+eDgIFu2bGFwcLDutkXlqXFt3bqVVatWsXXr1rrKisp37drFhg0b2LVrV8W2eX3R3d3N\nFVdcQXd3d8W2eeV52y2
8 years ago
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7f05193cde10>"
8 years ago
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# scatter makes a plot of x vs y\n",
"plt.scatter(iris.data[:,0], iris.target)\n",
8 years ago
"plt.ylabel(iris.feature_names[0])\n",
"plt.xlabel('species')"
8 years ago
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 11,
8 years ago
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEACAYAAABWLgY0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X18VOWZ8PHfnRcIBOIEUCEBMyFoK27Xyj6rIggBFlZt\nixXfgIkEX9r1pYKC9WP1GUlKXyy2qFS7LYuVqBREWn3oC7YsEgK0anexVevWdvOmJogixEA0BJLr\n+SNDSIaJ52TmzJkzk+v7+cyHyTln7nOdm+TOyX3OdS4jIiillEpNaYkOQCmlVPzoIK+UUilMB3ml\nlEphOsgrpVQK00FeKaVSmA7ySimVwmwP8saYNGPMHmPM5gjrSo0x74fW7zHG3OBsmEoppaKR0Ydt\nFwNvAjm9rN8gIotiD0kppZRTbJ3JG2NGA5cBaz5tM0ciUkop5Ri70zUPAV8HPi09do4x5k/GmI2h\nXwpKKaUSzHKQN8Z8AdgnIn+i82w90hn7ZsAvIp8H/hOocDRKpZRSUTFWz64xxnwHKAGOAYOAocAv\nRGRBL9unAQdExBdhnT4oRymloiAiUU2JW57Ji8i9InKGiIwF5gIvhg/wxpiR3b68nM4LtL215/nX\nsmXLEh6DxqlxJmuMGqfzr1j05e6aHowx5cAfReRXwCJjzGzgKHAAWBhTVEoppRzRp0FeRHYAO0Lv\nl3Vbfi9wr7OhKaWUipVmvEZQXFyc6BBs0TidlQxxJkOMoHF6ieWFV0d3Zoy4uT+llEoFxhgkyguv\nUc/JKxVvtbX1BINraWjoID8/jeXLF1JYWJDosFKC3++nvr4+0WGoMAUFBdTV1Tnapp7JK0+qra1n\n5swfUl1dDmQDLRQVLWPr1tt1oHdA6Mww0WGoML39v8RyJq9z8sqTgsG13QZ4gGyqq8sJBtcmMCql\nko8O8sqTGho6ODHAH5dNY2NHIsJRKmnpIK88KT8/DWgJW9pCXp5+yyrVF/oTozxp+fKFFBUt48RA\n3zknv3z5woTFpFQy0guvyrOO313T2NhBXp7eXeOkVLvwWl9fT2FhIceOHSMtLXnPXeNx4VUHeaX6\noVgGeS/e2lpXV0dRURFtbW2kp6cnNJZY6N01SqmEOn5r67p1d1FZWc66dXcxc+YPqa119p77733v\ne4wePZqcnBzOPvtstm/fjojwwAMPMG7cOE499VTmzp1LU1MTAFOnTgXA5/ORk5PDyy+/jIjwrW99\nC7/fz8iRI1m4cCHNzc0AHDlyhOuuu44RI0aQm5vLBRdcwAcffADA2rVrGT9+PDk5OYwbN47Vq1c7\nemyuc/lJaqKUSrxofxYDgTKBwwLS7XVYAoEyx2J76623ZMyYMfLee++JiEh9fb3U1NTIww8/LBMn\nTpTGxkZpa2uTm2++WebNmyciInV1dZKWliYdHR1d7Tz++ONy5plnSl1dnbS0tMicOXNkwYIFIiLy\nk5/8RGbPni2tra3S0dEhe/bskUOHDomIyG9+8xupra0VEZGqqioZPHiwvPrqq44d36fp7f8ltDyq\ncVfP5JVStrlxa2t6ejptbW288cYbHDt2jDPOOIPCwkJ+8pOf8O1vf5tRo0aRmZnJ/fffz6ZNm+jo\n6Oia4jj+L8DPfvYzlixZQkFBAYMHD+a73/0uGzZsoKOjg8zMTD788EP+9re/YYzhvPPOY8iQIQBc\neuml+P1+AC6++GJmzZrFzp07HTs+t+kgr5SyzY1bW4uKinj44YcpKyvjtNNOY/78+ezdu5f6+nqu\nuOIKhg0bxrBhwxg/fjyZmZns27cPY06erm5sbKSg4MS1goKCAo4ePcq+ffu47rrr+Nd//Vfmzp3L\n6NGjueeee2hvbwdgy5YtTJw4keHDh5Obm8uWLVvYv3+/Y8fnumj/BIjmhU7XKOUJ0f4s1tTUSVHR\n0m5TNoelqGip1NTUORxhp0OHDsm8efPkuuuuk89+9rPy+9//PuJ29fX1kpaWJu3t7V3LZsyYIf/+\n7//e9fVbb70lAwYM6LHN8c+OHz9efvrTn8qRI0dk8ODB8otf/KJruy9/+csSDAbjcHQn6+3/BZ2u\nUUq5obCwgK1bbycQ+D7Tpi0jEPi+488T+tvf/sb27dtpa2tjwIABDBo0iPT0dG6++Wbuvfde3n77\nbQA++OADNm/eDMCpp55KWloa1dXVXe3MmzePhx56iLq6Og4fPsx9993H3LlzSUtLo7KykjfeeIOO\njg6GDBlCZmZm1zRRW1sbI0aMIC0tjS1btvC73/3OsWNLBH0KpVKqTwoLC3j66WXWG0bpyJEj3HPP\nPfz1r38lMzOTiy66iNWrV3P66acjIsyaNYu9e/dy2mmnce211zJ79mwGDRrEfffdx6RJkzh27Bgv\nvPACN9xwA3v37mXKlCkcOXKESy65hFWrVgHw3nvvcfPNN9PQ0MCQIUOYO3cuJSUlpKWlsWrVKq6+\n+mra2tr40pe+xOWXXx63Y3WD3ievVD+UaslQqULvk1dKKdUnOl2j4saLmZFK9Tc6XaPiQot+eJtO\n13iTTteopKFFP5TyBh3kVVxo0Q+lvEEHeRUXWvRDKW/QnzgVF1r0Qylv0AuvKm606Id36YVXb9Ki\nIUopR+ggD0OHDuX111/veuJkNAoLC3n88ceZPn26IzHFY5DX++SVUv3SoUOHEh2CK3SQ76c0UUlF\nq7auluDKIA3NDeTn5LN8yXIK/YWJDusk7e3tni0F6GZseuG1H3KrhJtKPbV1tcz82kzWDV1HZWEl\n64auY+bXZlJbV+vYPlasWMHVV1/dY9nixYu54447aG5u5sYbbyQvL48xY8YQDAa7pjcqKiqYPHky\nS5YsYcSIEZSXl1NdXU1xcTE+n4/TTjuNefPmdbWZlpZGTU0NAK2trSxduhS/309ubm7XQ80ANm/e\nzD/8wz8wbNgwpk+fzl//+teIcbe1tXHHHXeQn5/P6NGjufPOOzl69CgAO3bsYMyYMaxYsYJRo0Zx\nww03ONZflqJ9RnE0L/R58p7gRgk35W3R/iwGbg8I9yKUdXvdiwRuDzgWW319vWRnZ8vhw4dFRKS9\nvV1GjRolL7/8slxxxRVyyy23yCeffCIffPCBXHDBBbJ69WoREVm7dq1kZGTIY489Ju3t7fLJJ5/I\nvHnz5Dvf+Y6IiBw5ckR2797dtZ+0tDSprq4WEZFbb71Vpk2bJnv37pWOjg75wx/+IG1tbfLWW29J\ndna2bNu2TY4dOyYrVqyQcePGydGjR0VExO/3y7Zt20REJBgMysSJE2X//v2yf/9+ueiii+T+++8X\nEZHKykrJyMiQb3zjG9LW1iatra0Rj723/xf0efKqLzRRSUWrobkBBoQtHACNzY2O7eOMM85gwoQJ\nPPfccwBs27aN7Oxs/H4/v/nNb3jooYfIyspixIgR3HHHHaxfv77rs/n5+dx6662kpaWRlZVFZmYm\n9fX1NDQ0MGDAAC666KKubaVbycAnnniCVatWMXLkSIwxXHjhhWRmZrJx40a++MUvMn36dNLT07nr\nrrv45JNP+P3vf39S3D/72c9YtmwZw4cPZ/jw4Sxbtoynnnqqa316ejrl5eVkZmYycOBAx/rLig7y\n/ZAmKqlo5efkQ1vYwjbIy8lzdD/z5s3rGrzXr1/P/Pnzqa+v5+jRo4waNYphw4aRm5vLzTff3KM0\n35gxY3q08+CDD9LR0cH555/P5z73OZ544omT9rV//36OHDnC2LFjT1oXXkLQGMOYMWNoaGiIuO0Z\nZ5zR9XVBQQGNjSd++Z166qlkZmb2oRecoT/V/ZAmKqloLV+ynKI/F50Y6Nug6M9FLF+y3NH9XH31\n1VRWVtLQ0MBzzz1HIBBgzJgxZGVl8eGHH3LgwAEOHjxIU1MTr732Wtfnwmu9nnbaaaxevZqGhgZ+\n/OMfc+utt3bNwx83YsQIsrKyelSVOi4vL4/6+p7Xqt555x1Gjx5tuW19fT15eSd++UWqQ+sGHeT7\nITdKuKnUVOgvZOujWwk
8 years ago
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7f05193a9c50>"
8 years ago
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot the distribution of the dataset\n",
"names = set(iris.target)\n",
"\n",
"# x and y are all the samples from column 0 (sepal_length) and 1 (sepal_width) respectively\n",
"x,y = iris.data[:,0], iris.data[:,1]\n",
"\n",
"for name in names:\n",
" cond = iris.target == name\n",
" plt.plot(x[cond], y[cond], linestyle='none', marker='o', label=iris.target_names[name])\n",
"\n",
"plt.legend(numpoints=1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"As we can see, the Setosa class seems to be linearly separable with these two features.\n",
8 years ago
"\n",
"Another nice visualisation is given below."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 13,
8 years ago
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7f04f5a176a0>"
8 years ago
]
},
8 years ago
"execution_count": 13,
8 years ago
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEPCAYAAAC+35gCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXd4VNX2sN89M5mWRkkIhN5770UIiGJFxAI2LPdeRMWu\nn+16xXLl6s8uig0VvV4RuyDSCUjvTRBCSSBg6BCSmbSZ9f1xBphkEhjIJCHMfp/nPDmzzzr7rFNy\n1tlrr722EhE0Go1GowkGU0UroNFoNJrKgzYaGo1GowkabTQ0Go1GEzTaaGg0Go0maLTR0Gg0Gk3Q\naKOh0Wg0mqApc6OhlEpVSq1TSq1RSi0vQeYdpVSKUmqtUqpDWeuk0Wg05Y1SqpZSavI57PeRUqrF\nGWTuVkrdeu7anYU+ZT1OQym1A+gsIkdK2H45MFpErlRKdQfeFpEeZaqURqPRnCcopcwi4qloPYKl\nPNxT6gzHuQb4AkBElgGxSqmEctBLo9FoygSl1Fil1L1+v59TSj2qlNrg+327UupnpdQcYLYyeF8p\ntUkpNUMp9atSaqhPdp5SqpNv/bhS6iWfV2axUirer/5HfOuNlVKzfDIrlVINlVKRSqnZvt/rlFKD\nz/XcysNoCDBDKbVCKfWPYrbXBnb7/d7jK9NoNJrKyjfAjX6/bwSWYrwPT9ARGCoi/YGhQD0RaQWM\nAHqWUG8ksFhEOgC/A8W9U78C3vXJ9AL+AtzAEBHpAgwAXj/XE7Oc645nQW8R+ctnEWcppTaLyMJy\nOK5Go9FUCCKyVikVr5SqCdQADgPpRcRmicgx33of4FvfvvuUUvNKqDpXRKb51lcBA/03KqWigEQR\n+cVXV56v3AKMVUr1BbxAolKqhojsP9tzK3OjISJ/+f4eUEr9CHQD/I3GHqCu3+86vrJCKKV0kiyN\nRhM0IqJKs38VpU6+0YNgn4jULFL2LXADUBOj5VGU7HNQK99v3UPx7/DizvsWIA7oKCJepdROwH4O\nxy9b95RSyumzfCilIoFLgY1FxH7BaI6hlOoBHBWRfcXVJyLlvjz33HNhc9xwOld93Av3mCKh+b48\nBrwU5AIU1w87GRgOXIevFXEaFgHX+fo2EoCkEuROawhFJAvYrZS6BkApZVVKOYBYYL8YBqM/UP8M\n+pRIWbc0EoAffa0EC/CViMxUSt0NiIh8JCLTlFJXKKW2YVjeO8tYJ41GowmKiFLsKyKblFLRQLoY\nLqfTvai/x+hr+AOjj3cVht2Cwv0gwVjEEcCHSqkXgDyM1s5XwBSl1DpgJbD5rE7GjzI1GiKyEwgY\ndyEiHxb5Pbos9dBoNJpzobQvSBFp57eeBrTzrU8EJvptE6XU4yKSrZSqBiwDNvi2DfCTi/Fb/x7D\n2CAiz/uVbwMuLkadXqU8HaB8OsIrNUlJSWFz3HA6V33cC/eYocRRvoebqpSqgtHAeUHOoZO6PCjz\nwX2hQikllUVXjUZTsSilkFJ2hCul5OMgZf9B6TveKwu6paHRaDQloF+QgehrotFoNCVQmo7wCxVt\nNDQajaYE9AsyEH1NNBqNpgR0SyMQbTQ0Go2mBLTRCEQbDY1GoymBcg65rRRoo6HRaDQloF+Qgehr\notFoNCWg3VOBaKOh0Wg0JaBfkIHoa6LRaDQloFsagWijodFoNCWgX5CB6Gui0Wg0JaBbGoFoo6HR\naDQloENuA9FGQ6PRaEpAtzQC0UZDo9FoSkC/IAPR10Sj0WhKICLYN2RBmapxXqGNhkaj0ZSARRuN\nALTR0Gg0mhKIMFe0Bucf2mhoNBpNCQTd0ggj9CXRaDSaEoiwVbQG5x/aaGg0Gk1J6DdkAKbyOIhS\nyqSUWq2U+qWYbbcrpfb7tq9WSt1VHjppNBrNGbEEuYQR5XW6DwKbgJgStk8SkQfKSReNRqMJjjAz\nCMFQ5i0NpVQd4Argk9OJlbUeGo1Gc9aYg1zCiPJwT70JPA7IaWSGKqXWKqUm+4yMRqPRVDzaPRVA\nmZ6uUupKYJ+IrFVKJVF8i+IX4H8ikq+UGglMBC4urr4xY8acXE9KSiIpKSnUKms0mkpIcnIyycnJ\noa9YR08FoERO1wAoZeVKvQzcijFe0gFEAz+IyIgS5E3AYRGpUsw2KUtdNRrNhYNSChEpldtbKSXS\nJUjZlZT6eJWFMnVPicjTIlJPRBoBw4G5RQ2GUqqm389rMDrMNRqNpuLR7qkAyiXktihKqeeVUlf5\nfj6glNqolFoDjAbuqAidNMEzb948LrtsMC1atOf22//Gli1bKloljaZs0B3hAZSpeyqUaPfU+cEH\nH3zIo4/+E5erJ1ADs3kndvtq5s2bSdeuXStaPY0GCKF7ql+QsvPDxz2ljYYmaLKzs6lRIxGX6zYg\n3m/LGrp2PcTy5b9XlGoaTSFCZjSKDckpRnZO+BiNCnFPaSony5cvx2KpQWGDAdCWVauWkpubWxFq\naTRlh+7TCCDMTldTGiIjI/F63RhDbvw/qnIxmy1YdEpQzYWGDrkNQLc0NEHTpUsXYmIiKBzgJkRE\nLOLaa4diNodZj6Dmwke3NAIIs9PVlAaTycTPP3/LxRdfhsezlezsKkRH7yEhwcq4cW9VtHoaTejR\n30EB6I5wzVmTmZnJN998w65du+jYsSODBw/WrinNeUXIOsJvClL26/DpCNdGQ6PRXHCEzGjcFqTs\nl+FjNPTnoUaj0ZSEdk8FoI2GRqPRlIR+QwagL4lGo9GUhL2iFTj/0EZDA4CIkJubi81mQ6mwcM1q\nNGdGu6cC0OM0whwR4bXX3iA+PpGoqBgSEury7rvj0EEHGg16nEYxhNnpaory3HMv8Prrn+FyDQFq\ncuDAHp588hVcLjdPPPF4Raun0VQs+g0ZgA65DWNcLhfx8bVwue4C/Oe9OkB09NccPPgXVqu1otTT\naM6ZkIXcPh2k7MvhE3Kr3VNhzM6dOzGboylsMADi8XpN7N27tyLU0mjOH7R7KoAwO12NPzVr1iQ/\n/xiQQ+EwkWw8nhzi4uIqSDON5jxBvyED0C2NMKZ69epceeVV2GyzgHxfaR52+yyGDRtGVFRURaqn\n0VQ8tiCXMEL3aVRiDh06xMcff8Lvvy+lSZOG3Hvv3TRv3vys6jh+/Dg33ngLyckLsNlqkZu7l0GD\nBvG//32O0+ksI801mrIlZH0arwcp+2j49Gloo1FJ2b59O92798blqoPbXR+L5RBW61q+/voLBg8e\nfNb17dy5k+3bt9OsWTPq1atXBhprNOVHyIzG20HKPqiNxnmHNhqFGTToambPzsXr7e1XupvY2J/Z\nv3+PjnrShDUhMxrjgpQdHT5GQ/dpVEIKCgqYM2cGXm+XIlvqIhLN0qVLK0QvjeaCQ0dPBRBmp6vR\naDRngX5DBqBbGpUQi8XCwIGXYTKtKLJlFyZTFj179qwQvTSaCw5zkEsYUS52VCllAlYC6SIyuMg2\nK/AF0Bk4CAwTkV3loVdl5v3336J7995kZx/G7a5HRMQhIiLW8+WX/yUiIqLC9NqwYQN79+6lQ4cO\nJCQkVJgeGk1I0FluAyivxteDwCYgpphtfwMOi0hTpdQw4FVgeDnpVWlp1KgRW7b8wYQJn7Jw4TKa\nNOnOPfd8SpMmTSpEn/T0dK688lq2bUslIiKenJzd3HHH7bz33tuYzWH2Kaa5cNDuqQDKPHpKKVUH\n+Az4N/BIMS2N6cBzIrJMKWUGMkQkvph6dPTUeYqI0LZtJ/78sxoeTx8Mr6cbp/N7nn32Hzz55P+r\naBU1YUbIoqd+CFJ2qI6eCiVvAo8DJb3xawO7AUTEAxxVSlUrB700IWLVqlWkpmb4GQwABy7Xxbzx\nxjsVqZpGUzp09FQAZXq6SqkrgX0islYplQQEY4lLlBkzZszJ9aSkJJKSkkqpoSYUpKenYzbXIPAb\nJJ5DhzIqQiVNmJGcnExycnLoKw4zgxAMZeqeUkq9DNwKFAAOIBr4QURG+Mn8Bozxc0/9JSI1iqlL\nu6fOU1JTU2nZsj05OaM
8 years ago
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7f04f5a9c4a8>"
8 years ago
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x_index = 0\n",
"y_index = 1\n",
"formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])\n",
"plt.scatter(iris.data[:, x_index], iris.data[:, y_index], s=40,\n",
"c=iris.target)\n",
"plt.colorbar(ticks=[0, 1, 2], format=formatter)\n",
"plt.xlabel(iris.feature_names[x_index])\n",
"plt.ylabel(iris.feature_names[y_index])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"This alternate visualisation also suggests that the Setosa class seems to be linearly separable.\n",
8 years ago
"\n",
"Students interested in practicing advanced visualisations can check [Advanced visualisation notebook](2_3_1_Advanced_Visualisation.ipynb).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Feature selection](http://scikit-learn.org/stable/modules/feature_selection.html)\n",
"* [Classification probability](http://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html)\n",
"* [Mastering Pandas](http://proquest.safaribooksonline.com/book/programming/python/9781783981960), Femi Anthony, Packt Publishing, 2015.\n",
"* [Matplotlib web page](http://matplotlib.org/index.html)\n",
"* [Using matlibplot in IPython](http://ipython.readthedocs.org/en/stable/interactive/plotting.html)\n",
"* [Seaborn Tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial.html)\n",
"* [Iris dataset visualisation notebook](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook)\n",
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
8 years ago
}
},
"nbformat": 4,
"nbformat_minor": 0
}