You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml1/2_5_1_kNN_Model.ipynb

563 lines
36 KiB
Plaintext

8 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
8 years ago
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [kNN Model](#kNN-Model)\n",
"* [Load data and preprocessing](#Load-data-and-preprocessing)\n",
"* [Train classifier](#Train-classifier)\n",
"* [Evaluating the algorithm](#Evaluating-the-algorithm)\n",
" * [Precision, recall and f-score](#Precision,-recall-and-f-score)\n",
"\t* [Confusion matrix](#Confusion-matrix)\n",
"\t* [K-Fold validation](#K-Fold-validation)\n",
"* [Tuning the algorithm](#Tuning-the-algorithm)\n",
"* [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# kNN Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"The goal of this notebook is to learn how to train a model, make predictions with that model and evaluate these predictions.\n",
8 years ago
"\n",
"The notebook uses the [kNN (k nearest neighbors) algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"## Loading data and preprocessing\n",
8 years ago
"\n",
"The first step is loading and preprocessing the data as explained in the previous notebooks."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
8 years ago
"outputs": [],
"source": [
"# library for displaying plots\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# display plots in the notebook \n",
8 years ago
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
8 years ago
"outputs": [],
"source": [
8 years ago
"## First, we repeat the load and preprocessing steps\n",
"\n",
"# Load data\n",
"from sklearn import datasets\n",
"iris = datasets.load_iris()\n",
"\n",
"# Training and test spliting\n",
"from sklearn.model_selection import train_test_split\n",
8 years ago
"\n",
"x_iris, y_iris = iris.data, iris.target\n",
"\n",
"# Test set will be the 25% taken randomly\n",
"x_train, x_test, y_train, y_test = train_test_split(x_iris, y_iris, test_size=0.25, random_state=33)\n",
"\n",
"# Preprocess: normalize\n",
"from sklearn import preprocessing\n",
"scaler = preprocessing.StandardScaler().fit(x_train)\n",
"x_train = scaler.transform(x_train)\n",
"x_test = scaler.transform(x_test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Train classifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The usual steps for creating a classifier are:\n",
"1. Create classifier object\n",
"2. Call *fit* to train the classifier\n",
"3. Call *predict* to obtain predictions\n",
"\n",
"Once the model is created, the most relevant methods are:\n",
"* model.fit(x_train, y_train): train the model\n",
"* model.predict(x): predict\n",
"* model.score(x, y): evaluate the prediction"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
8 years ago
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=15, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 3,
8 years ago
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"import numpy as np\n",
"\n",
"# Create kNN model\n",
"model = KNeighborsClassifier(n_neighbors=15)\n",
"\n",
"# Train the model using the training sets\n",
"model.fit(x_train, y_train) "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction [1 0 1 1 1 0 0 1 0 2 0 0 1 2 0 1 2 2 1 1 0 0 1 0 0 2 1 1 2 2 2 2 0 0 1 1 0\n",
" 1 2 1 2 0 2 0 1 0 2 1 0 2 2 0 0 2 0 0 0 2 2 0 1 0 1 0 1 1 1 1 1 0 1 0 1 2\n",
" 0 0 0 0 2 2 0 1 1 2 1 0 0 2 1 1 0 1 1 0 2 1 2 1 2 0 2 0 0 0 2 1 2 1 2 1 2\n",
" 0]\n",
"Expected [1 0 1 1 1 0 0 1 0 2 0 0 1 2 0 1 2 2 1 1 0 0 2 0 0 2 1 1 2 2 2 2 0 0 1 1 0\n",
" 1 2 1 2 0 2 0 1 0 2 1 0 2 2 0 0 2 0 0 0 2 2 0 1 0 1 0 1 1 1 1 1 0 1 0 1 2\n",
" 0 0 0 0 2 2 0 1 1 2 1 0 0 1 1 1 0 1 1 0 2 2 2 1 2 0 1 0 0 0 2 1 2 1 2 1 2\n",
" 0]\n"
]
}
],
"source": [
"print(\"Prediction \", model.predict(x_train))\n",
"print(\"Expected \", y_train)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy in training 0.964285714286\n"
]
}
],
"source": [
"# Evaluate Accuracy in training\n",
"\n",
"from sklearn import metrics\n",
"y_train_pred = model.predict(x_train)\n",
"print(\"Accuracy in training\", metrics.accuracy_score(y_train, y_train_pred))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy in testing 0.921052631579\n"
]
}
],
"source": [
"# Now we evaluate error in testing\n",
"y_test_pred = model.predict(x_test)\n",
"print(\"Accuracy in testing \", metrics.accuracy_score(y_test, y_test_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to visualize the Nearest Neighbors classification. It will plot the decision boundaries for each class.\n",
"\n",
"We are going to import a function defined in the file [util_knn.py](files/util_knn.py) using the *magic command* **%run**."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
8 years ago
"outputs": [
{
"ename": "AttributeError",
"evalue": "module 'matplotlib.colors' has no attribute 'to_rgba'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-7-626d889b2bd3>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'run'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'util_knn.py'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mplot_classification_iris\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/GoogleDrive/cursos/sitc/sitc/ml1/util_knn.py\u001b[0m in \u001b[0;36mplot_classification_iris\u001b[0;34m()\u001b[0m\n\u001b[1;32m 49\u001b[0m % (n_neighbors, weights))\n\u001b[1;32m 50\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 51\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py\u001b[0m in \u001b[0;36mshow\u001b[0;34m(*args, **kw)\u001b[0m\n\u001b[1;32m 242\u001b[0m \"\"\"\n\u001b[1;32m 243\u001b[0m \u001b[0;32mglobal\u001b[0m \u001b[0m_show\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 244\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_show\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 245\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 246\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda3/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py\u001b[0m in \u001b[0;36mshow\u001b[0;34m(close, block)\u001b[0m\n\u001b[1;32m 37\u001b[0m display(\n\u001b[1;32m 38\u001b[0m \u001b[0mfigure_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcanvas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfigure\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 39\u001b[0;31m \u001b[0mmetadata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0m_fetch_figure_metadata\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfigure_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcanvas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfigure\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 40\u001b[0m )\n\u001b[1;32m 41\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda3/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py\u001b[0m in \u001b[0;36m_fetch_figure_metadata\u001b[0;34m(fig)\u001b[0m\n\u001b[1;32m 172\u001b[0m \u001b[0;34m\"\"\"Get some metadata to help with displaying a figure.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 173\u001b[0m \u001b[0;31m# determine if a background is needed for legibility\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 174\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0m_is_transparent\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_facecolor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 175\u001b[0m \u001b[0;31m# the background is transparent\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 176\u001b[0m ticksLight = _is_light([label.get_color()\n",
"\u001b[0;32m~/anaconda3/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py\u001b[0m in \u001b[0;36m_is_transparent\u001b[0;34m(color)\u001b[0m\n\u001b[1;32m 193\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_transparent\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolor\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 194\u001b[0m \u001b[0;34m\"\"\"Determine transparency from alpha.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 195\u001b[0;31m \u001b[0mrgba\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcolors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_rgba\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolor\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 196\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mrgba\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m.5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: module 'matplotlib.colors' has no attribute 'to_rgba'"
]
8 years ago
}
],
"source": [
"%run util_knn.py\n",
"plot_classification_iris()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluating the algorithm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision, recall and f-score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
"\n",
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
"* **F1-score**: This is the harmonic mean of precision and recall, and tries to combine both in a single number."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 14,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" setosa 1.00 1.00 1.00 8\n",
" versicolor 0.79 1.00 0.88 11\n",
" virginica 1.00 0.84 0.91 19\n",
"\n",
"avg / total 0.94 0.92 0.92 38\n",
"\n"
]
}
],
"source": [
8 years ago
"print(metrics.classification_report(y_test, y_test_pred, target_names=iris.target_names))"
8 years ago
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confusion matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful metric is the confusion matrix"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 15,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 8 0 0]\n",
" [ 0 11 0]\n",
" [ 0 3 16]]\n"
]
}
],
"source": [
"print(metrics.confusion_matrix(y_test, y_test_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
8 years ago
"source": [
"We see we classify well all the 'setosa' and 'versicolor' samples. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### K-Fold validation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0.93333333 0.8 1. 0.93333333 0.93333333 0.93333333\n",
" 1. 1. 0.86666667 1. ]\n"
]
}
],
"source": [
"from sklearn.model_selection import cross_val_score, KFold\n",
8 years ago
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"# create a composite estimator made by a pipeline of preprocessing and the KNN model\n",
"model = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('kNN', KNeighborsClassifier())\n",
"])\n",
"\n",
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
8 years ago
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"print(scores)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
8 years ago
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean score: 0.940 (+/- 0.021)\n"
]
}
],
"source": [
"from scipy.stats import sem\n",
"def mean_score(scores):\n",
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
"print(mean_score(scores))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, we get an average accuracy of 0.940."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tuning the algorithm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to tune the algorithm, and calculate which is the best value for the k parameter."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 18,
"metadata": {},
8 years ago
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7fc7cb526160>"
8 years ago
]
},
8 years ago
"execution_count": 18,
8 years ago
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEPCAYAAABRHfM8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4HNV55/Hv72pDC0JCQqDtSmKRWMxiMGIXLWwHOWFC\nRjxJYBJjMomTzBhnJjNmgAwJV3HsQB4ncWJiJ3G84OCJ7NiTmJCxjbF0EQZsZJBkI0sIY+lqRUhC\nQqAFLfedP06VbqvVfbu6u6rX9/M8/ai7urrr3KPq81adVWaGc845V05XoxPgnHOuNXjAcM45l4gH\nDOecc4l4wHDOOZeIBwznnHOJeMBwzjmXSOYBQ9ICSWslrZN0d5H3uyU9IWmVpCWSpuS9d1TSC5JW\nSPrXrNPqnHOuNGU5DkNSF7AOeDewFVgO3Gpma/P2+SrwqJk9IikH/Gczuz16b6+Zjc0sgc455xLL\n+g5jLvCymfWZ2WFgMXBzwT7nA0sBzKy34H1lnD7nnHMJZR0wpgKb8l5vjrblWwksBJC0EBgjaXz0\n3ghJz0l6RlJhoHHOOVdHzdDofReQk/Q8cB2wBTgavTfDzOYCvwZ8UtKsBqXROec63tCMv38L0J33\nelq07Rgz2wbcAiBpNHCLme3New8zWy+pF3gnsD7/85J8MiznnKuCmVVU7Z/1HcZy4GxJMyQNB24F\nHs3fQdIESXGi7wU+H20fF30GSROBq4GfFDuImfkjpcf999/f8DS008Pz0/OzWR/VyDRgmNlR4E7g\ncWA1sNjM1khaJOmmaLcc8JKktcAk4GPR9vOAH0paAXwX+FPL613lnHOuvrKuksLMvgXMKdh2f97z\nrwNfL/K5Z4GLsk6fc865ZJqh0ds1kVwu1+gktBXPz3R5fjZWpgP36kGS1fI3bNwIP/0p3HBDioly\nLW/XLnj9dTjnnEanpPFeew3eegvOPLPRKXFpkoQ1WaN303vpJfiTP2l0KlyzefhhuO++RqeiOfz9\n38NHP9roVLhm0PEBY+ZM2LCh0alwzWb9+vBwnhduQMcHjO5u2LIFjhxpdEpcM/FCcoDnhYt1fMAY\nMQImTYLNmxudEtdM1q+HnTtD3X2nW78+/D4OH250SlyjdXzAAJg1y6+g3ACzUE05daqfF0eOhDvw\n008PHURcZ/OAgQcMd7wdO+Ckk+Cii7x9a/PmECxmz/a8cB4wAA8Y7njr14dzws8Lzwt3PA8YeE8p\ndzwvJAd4Xrh8HjDwH4M7nheSAzwvXD4PGPiPwR3PC8kBnhcunwcMQm+YnTvh4MFGp8Q1g8JCssVn\nz6mJBwyXzwMGMGQITJ8OfX2NTolrBhs2hHatceNAgt27G52ixonzYvJk2LMHDhxodIpcI3nAiPgV\nlAM4ehQ2bQqFpNTZ58XBg+HOe+pU6OoKsyJ455DO5gEj4j2lHMDWrXDqqWEcBnR2wOjrC3feQ4aE\n152cFy7IPGBIWiBpraR1ku4u8n63pCckrZK0RNKUgvdPlrRJ0l9nmU7/MTgYqLOPdfJ54XnhCmUa\nMCR1AQ8BNwIXALdJOrdgt08AXzSzi4E/Bh4oeP+jwJNZphP8x+ACLyQHeF64QlnfYcwFXjazPjM7\nDCwGbi7Y53xgKYCZ9ea/L+kywjrfj2ecTv8xOMALyXyeF65Q1gFjKrAp7/XmaFu+lcBCAEkLgTGS\nxksS4e7jI0BFq0JVw38MDgZ6BcVmzuzc82L9+hPzwtv5OtvQRicAuAt4SNIdwDJgC3AU+K/Av5vZ\n1hA7SgeNnp6eY89zuVxV6/5OmgT798Obb8LJJ1f8cdcm1q+H228feD1zZmj8NQu9pjrJhg1+h9FO\nent76e3trek7Ml3TW9KVQI+ZLYhe3wOYmT1YYv/RwBoz65b0CHAt0A+cDAwDPm1mf1DwmZrW9M53\n/vnwla/AhRem8nWuBXV3w5NPHl9QTpoEq1aFsQidZOJEWL06zFYLIWiefHKY7vyUUxqbNle7ZlzT\nezlwtqQZkoYDtwKP5u8gaUJU/QRwL/B5ADP7dTObaWZnEqqlvlQYLNLmV1Cd7dAh2L49dCXN14nn\nxZtvhkF6kyYNbOv0cSku44BhZkeBOwmN1quBxWa2RtIiSTdFu+WAlyStJTRwfyzLNA3GfwydbeNG\nmDIFhhZU1HbieRG3XxRWw3ViXrgBmbdhmNm3gDkF2+7Pe/514OtlvuNh4OFMEpjHfwydrbCRN9aJ\n50VhD6lYJ3cCcD7S+zidWDC4AYWNvLFOLCQHC57eU6pzecDI4wGjs5W6qu7EQrJU8PTfSGfzgJEn\n7mfeydNZd7LBAkanFZKeF64YDxh5xo8Ps3K+/nqjU+IaoVQh2d0dupIeOVL/NDVKubstv6jqTB4w\nCvgVVOcqVUiOGBG6l27eXP80NYJZ6bwYOzbkx44d9U+XazwPGAU8YHSm/fvhjTfgjDOKv99J58Wu\nXaFr8bhxxd/3KUI6lweMAp1UMLgBGzbAjBmhSrKYTuopVaqHVMx/I53LA0YB/zF0plJVMLFO6ilV\nqodUzH8jncsDRoFOupJ0A5IEjE45LzwvXCkeMAp00pWkG+CF5ADPC1eKB4wC8XTW/f2NTomrJy8k\nB3heuFI8YBQYPTp0HXz11UanxNVT4cJJhaZOhZ074eDBeqWoccoFjBkzYNMmv6jqRB4wivArqM5T\nrpAcMiRMe97XV780NUJ/f/gbBwueI0eGQa5bt9YtWa5JeMAowgNGZ9mzJ4zinjBh8P06YfzBq6+G\nxZFGjRp8P/+NdCYPGEV4T6nOEt9dlFuCtRMKyXJ3WrFOyAt3Ig8YRXhPqc7iheQAzws3mMwDhqQF\nktZKWifp7iLvd0t6QtIqSUskTcnb/rykFyT9WNLvZJ3WmP8YOosXkgM8L9xgMg0YkrqAh4AbgQuA\n2ySdW7DbJ4AvmtnFwB8DD0TbtwFXmtmlwBXAPZJKzPSTLv8xdJZyPaRinXBeJA0YndCe406U9R3G\nXOBlM+szs8PAYuDmgn3OB5YCmFlv/L6ZHY4+AzASKFPDnJ7u7tADpJOms+5kflU9wPPCDSbrgDEV\n2JT3enO0Ld9KYCGApIXAGEnjo9fTJK0C+oAHzawuoyOGD4fTTw99zV37S1pITpoUZrV9883s09Qo\nSe+2pk8PPaoOHy6/r2sfQxudAOAu4CFJdwDLgC3AUQAz2wxcHFVFfUPS18zshJn4e3p6jj3P5XLk\ncrmaExX3lEpSkLjWZVZ+sr2YNFAVc+GFGSesAY4cCXfW3d3l9x02DCZPho0b4ayzsk+bq11vby+9\nvb01fUfWAWMLkH/6TYu2HWNm24BbACSNBm4xs70F+7wq6UXgOuD/Fh4kP2CkxXtKdYbXXgsD0U4+\nOdn+cVVMOwaMTZvCnfXw4cn2j/PCA0ZrKLyYXrRoUcXfkXWV1HLgbEkzJA0HbgUezd9B0gTpWA/4\ne4HPR9unSjopej4euBZ4KeP0HuN1tJ2h0rvIdj4vPC9cOZkGDDM7CtwJPA6sBhab2RpJiyTdFO2W\nA16StBaYBHws2n4e8ANJKwiN4n9mZquzTG8+/zF0hqR19rF2Pi8qDRjeU6rzZN6GYWbfAuYUbLs/\n7/nXga8X+dwTwMVZp6+Udi4Y3IBqrqqXLcsuPY1UTV5885vZpcc1Hx/pXYIHjM7gV9UD/G7LleMB\no4QpU2DXLjhwoNEpcVmqtt7eLLs0NYq3YbhyPGCUEE9nvXFjo1PislRpITl+PHR1weuvZ5emRqk0\nLyZPhjfeCGNTXGfwgDEIv4Jqb0ePhq6kM2ZU9rl2PC8OHAhBcMqU5J/p6gpjNtq1is6dyAPGINqx\nYHADtm4Na2CcdFJln2vH86KvL9xRDxlS2efauU3HncgDxiDasWBwA6odyd+O54XnhUvCA8Yg/MfQ\n3qotJNvxqrrSHlIx/410Fg8Yg/CV99qbX1UP8LxwSXjAGITPJ9XevJAc4HnhkvCAMYhJk0LvkXae\nzrqT1VIl1dcH/f2pJ6l
8 years ago
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fc7cb726ac8>"
8 years ago
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"k_range = range(1, 21)\n",
"accuracy = []\n",
"for k in k_range:\n",
" m = KNeighborsClassifier(k)\n",
" m.fit(x_train, y_train)\n",
" y_test_pred = m.predict(x_test)\n",
" accuracy.append(metrics.accuracy_score(y_test, y_test_pred))\n",
"plt.plot(k_range, accuracy)\n",
"plt.xlabel('k value')\n",
"plt.ylabel('Accuracy')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is very dependent of the input data. Execute again the train_test_split and test again how the result changes with k."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [KNeighborsClassifier API scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)\n",
"* [Learning scikit-learn: Machine Learning in Python](http://proquest.safaribooksonline.com/book/programming/python/9781783281930/1dot-machine-learning-a-gentle-introduction/ch01s02_html), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2013.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
8 years ago
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.6"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
8 years ago
}
},
"nbformat": 4,
"nbformat_minor": 1
8 years ago
}