1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-17 20:12:28 +00:00
sitc/ml2/3_7_SVM.ipynb

1179 lines
123 KiB
Plaintext
Raw Normal View History

2016-03-28 12:03:08 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction SVM "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we are going to train a classifier with the preprocessed Titanic dataset. \n",
"\n",
"We are going to use the dataset we obtained in the [pandas munging notebook](3_3_Data_Munging_with_Pandas.ipynb) for simplicity. You can try some of the techniques learnt in the previous notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and clean"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# General import and load data\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"from pandas import Series, DataFrame\n",
"\n",
"# Training and test spliting\n",
"from sklearn.cross_validation import train_test_split\n",
"from sklearn import preprocessing\n",
"\n",
"# Estimators\n",
"from sklearn.svm import SVC\n",
"\n",
"# Evaluation\n",
"from sklearn import metrics\n",
"from sklearn.cross_validation import cross_val_score, KFold, StratifiedKFold\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.metrics import roc_curve\n",
"from sklearn.metrics import roc_auc_score\n",
"\n",
"# Optimization\n",
"from sklearn.grid_search import GridSearchCV\n",
"\n",
"# Visualisation\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"sns.set(color_codes=True)\n",
"\n",
"\n",
"# if matplotlib is not set inline, you will not see plots\n",
"#alternatives auto gtk gtk2 inline osx qt qt5 wx tk\n",
"#%matplotlib auto\n",
"#%matplotlib qt\n",
"%matplotlib inline\n",
"%run plot_learning_curve"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked\n",
"0 1 0 3 0 22.0 1 0 7.2500 0\n",
"1 2 1 1 1 38.0 1 0 71.2833 1\n",
"2 3 1 3 1 26.0 0 0 7.9250 0\n",
"3 4 1 1 1 35.0 1 0 53.1000 0\n",
"4 5 0 3 0 35.0 0 0 8.0500 0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#We get a URL with raw content (not HTML one)\n",
2016-03-29 09:21:43 +00:00
"url=\"https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv\"\n",
2016-03-28 12:03:08 +00:00
"df = pd.read_csv(url)\n",
"df.head()\n",
"\n",
"\n",
"#Fill missing values\n",
"df['Age'].fillna(df['Age'].mean(), inplace=True)\n",
"df['Sex'].fillna('male', inplace=True)\n",
"df['Embarked'].fillna('S', inplace=True)\n",
"\n",
"# Encode categorical variables\n",
"df['Age'] = df['Age'].fillna(df['Age'].median())\n",
"df.loc[df[\"Sex\"] == \"male\", \"Sex\"] = 0\n",
"df.loc[df[\"Sex\"] == \"female\", \"Sex\"] = 1\n",
"df.loc[df[\"Embarked\"] == \"S\", \"Embarked\"] = 0\n",
"df.loc[df[\"Embarked\"] == \"C\", \"Embarked\"] = 1\n",
"df.loc[df[\"Embarked\"] == \"Q\", \"Embarked\"] = 2\n",
"\n",
"# Drop colums\n",
"df.drop(['Cabin', 'Ticket', 'Name'], axis=1, inplace=True)\n",
"\n",
"#Show proprocessed df\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId int64\n",
"Survived int64\n",
"Pclass int64\n",
"Sex object\n",
"Age float64\n",
"SibSp int64\n",
"Parch int64\n",
"Fare float64\n",
"Embarked object\n",
"dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Check types are numeric\n",
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have still two columns as objects, so we change the type."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId int64\n",
"Survived int64\n",
"Pclass int64\n",
"Sex int64\n",
"Age float64\n",
"SibSp int64\n",
"Parch int64\n",
"Fare float64\n",
"Embarked int64\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Sex'] = df['Sex'].astype(np.int64)\n",
"df['Embarked'] = df['Embarked'].astype(np.int64)\n",
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId False\n",
"Survived False\n",
"Pclass False\n",
"Sex False\n",
"Age False\n",
"SibSp False\n",
"Parch False\n",
"Fare False\n",
"Embarked False\n",
"dtype: bool"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Check there are not missing values\n",
"df.isnull().any()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and test splitting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use the same techniques we applied in the Iris dataset. \n",
"\n",
"Nevertheless, we need to remove the column 'Survived' "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Features of the model\n",
"features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']\n",
"# Transform dataframe in numpy arrays\n",
"X = df[features].values\n",
"y = df['Survived'].values\n",
"\n",
"\n",
"\n",
"# Test set will be the 25% taken randomly\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)\n",
"\n",
"# Preprocess: normalize\n",
"#scaler = preprocessing.StandardScaler().fit(X_train)\n",
"#X_train = scaler.transform(X_train)\n",
"#X_test = scaler.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Define model"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"\n",
"types_of_kernels = ['linear', 'rbf', 'poly']\n",
"\n",
"kernel = types_of_kernels[0]\n",
"gamma = 3.0\n",
"\n",
"# Create kNN model\n",
"model = SVC(kernel=kernel, probability=True, gamma=gamma)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and evaluate"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#This step will take some time \n",
"# Train - This is not needed if you use K-Fold\n",
"\n",
"model.fit(X_train, y_train)\n",
"\n",
"predicted = model.predict(X_test)\n",
"expected = y_test"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.81165919282511212"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Accuracy\n",
"metrics.accuracy_score(expected, predicted)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we get around 82% of accuracy! (results depend on the splitting)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Null accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can evaluate the accuracy if the model always predict the most frequent class, following this [refeference](http://blog.kaggle.com/2015/10/23/scikit-learn-video-9-better-evaluation-of-classification-models/)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 134\n",
"1 89\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Count number of samples per class\n",
"s_y_test = Series(y_test)\n",
"s_y_test.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.3991031390134529"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Mean of ones\n",
"y_test.mean()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.60089686098654704"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Mean of zeros\n",
"1 - y_test.mean() \n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.60089686098654704"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calculate null accuracy (binary classification coded as 0/1)\n",
"max(y_test.mean(), 1 - y_test.mean())"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 0.600897\n",
"dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calculate null accuracy (multiclass classification)\n",
"s_y_test.value_counts().head(1) / len(y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, since our accuracy was 0.82 is better than the null accuracy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confussion matrix and F-score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can obtain more information from the confussion matrix and the metric F1-score.\n",
"In a confussion matrix, we can see:\n",
"\n",
"||**Predicted**: 0| **Predicted: 1**|\n",
"|---------------------------|\n",
"|**Actual: 0**| TN | FP |\n",
"|**Actual: 1**| FN|TP|\n",
"\n",
"* **True negatives (TN)**: actual negatives that were predicted as negatives\n",
"* **False positives (FP)**: actual negatives that were predicted as positives\n",
"* **False negatives (TN)**: actual positives that were predicted as negatives\n",
"* **True negatives (TN)**: actual positives that were predicted as posiives\n",
"\n",
"We can calculate several metrics from the confussion matrix\n",
"\n",
"* **Recall** (also called *sensitivity*): when the actual value is positive, how often the prediction is correct? \n",
"(TP / (TP + FN))\n",
"* **Specificity**: when the actual value is negative, how often the prediction is correct? (TN / (TN + FP))\n",
"* **False Positive Rate**: when the actual value is negative, how often the prediction is incorrect? (FP / (TN + FP))\n",
"* **Precision**: when a positive value is predicted, how many times is correct? (TP / (TP + FP)\n",
"A good metric is F1-score: 2TP / (2TP + FP + FN)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[115 19]\n",
" [ 23 66]]\n"
]
}
],
"source": [
"# Confusion matrix\n",
"print(metrics.confusion_matrix(expected, predicted))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.83 0.86 0.85 134\n",
" 1 0.78 0.74 0.76 89\n",
"\n",
"avg / total 0.81 0.81 0.81 223\n",
"\n"
]
}
],
"source": [
"# Report\n",
"print(classification_report(expected, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ROC (Receiver Operating Characteristic ) and AUC (Area Under the Curve)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) curve illustrates the performance of a binary classifier system as its discrimination threshold is varied."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEZCAYAAABy91VnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH+pJREFUeJzt3XmYXFWd//F39ZJOp9PZSAMJSwhLvoJAZJMkLAFZxpHF\nCDqK4IKCDijo4DKGHw7qOAOI8hNRNICDAoo6IIqoCEgCmEDEkEBA/bIEWQJkT6fTW9Lpmj/OrXSl\n6a6u7q6u6j79eT0Pz1N3qXtPHTqfe+65956bSqfTiIhInMpKXQARERk4CnkRkYgp5EVEIqaQFxGJ\nmEJeRCRiCnkRkYhVlLoAMnSYWTuwHGgH0sAooB640N2XJOuMAr4KnAa0Juv9Bvgvd2/J2tZHgE8C\nI4ERwJ+Af3f3+qL9oF4ws+8DJwM/dfcv9+H7JwDfJNTHJKAceDVZfAVwIPCcu99mZl8Glrn7b/pY\n1tOAE9z9s335vsQlpfvkJV9mtg2Y6O4bsuZ9DjjT3WeZWTnwCLAIuMzdW8xsJHAlcAhwvLu3m9ml\nwD8l31ubfO9a4CB3n13s35WP5Lfv4e6vFWBblwM7ufvF3SyfD1zn7r/s775E1JKX3kgl/wGQhPOe\nwLpk1r8AKXf/fGadpPX+WTNbCrzHzH4PzAWmu/vaZJ1tZvb5ZHmFu7dl79TMTgX+M9l3I3AB4Qzi\naXevTdaZkplOzhI+TjjT2ARUAd/KhKaZXZHsd66ZfTzZXir5HRe5u3fa/8PJx9+b2YXABuC7wE6E\ns5pr3P1WM5tNOFg1Jvt+u7tvzadizexm4GmgGTgcuDo5sPwV+B5QA0wGlgHvd/ctZtZMOICeRDg7\nuNbdv5P8/ve6+2lmtgvwA+AtwDZgnrtfl0+ZJA7qk5femm9my8xsJfAsofvh3GTZTODhbr73R+Bo\nQtg0uvuK7IXu3uLut3cR8DsDtwIfdve3Ebo8rkgWdz4NzZ4+AJjt7u8AbsyU0czKgHOAG83sWODD\nwNHufhhwNfCm1rO7H0s4CBwHPAbcTQjU6cC7gP82syOT1d9KCOFD8g347PK7+/XAX4DPu/uvgfOB\nH7n7UcB+wN7AKcn6VcBqdz8aeB9wlZmN6FQX3w8/wfcHZgHnm9nevSyXDGEKeemt45KwPQWoBhZl\nWuSJym6+V0UInnZ693d3FLDc3ZcDuPtd7n5KD98BeMrdG5PPvwBmJAeMdxL6vlckv2EfYFFypvEN\nYJyZjetmmylgGlCVBDDu/jpwZ7JdgFfc/dVuvt8bmTOmfwfWmtkXCIE9CRidtd7dSTmeIFzbqOm0\nnROAG5J1Nrn7wZ0PsBI3hbz0VgrA3ZcBlwA/NLM9k2ULgWM7f8HMUsn8hYTuh8rOrUkzqzKz35rZ\nrp2+3kanFruZHZTMy/77HcGONmc+uHsT8L/A2cBHCS17CBc/b3X3Q5OW9yGELpaNXfzuTBm6+jdT\nRsfBbXMXy/vjZ4TW/D+Aa4ClZHWZEbp3sqU6Te9Qf2Y21cxqC1xGGcQU8tJn7v4zwkXWa5NZdwCN\nZvbt5IIrZlYNXAc0AL9y9y3AVcD/JC1rzKwK+DYwyt3f6LSbxcD+ZrZ/su4cQvfNRsLB4i3Jemf0\nUNybCAE/k9DyBrgPOCtzYEn62x/o5vuZ8HRgS1IOzGwycCZwfw/77402Og4aJwFfc/f/TcpwJOHg\nlKuM2e6no6tqLKHbbN8CllUGOYW89EZXt2JdBLzTzE5y922E2wwbgSVm9hShf7kByCzH3a8kBO0f\nzOwJQus0Dby788bdfTWhBX5Lsu5nCX3em4AvAvea2WLCRcVuJd0ZW4E7kgMN7n4f4YBzv5ktAz4A\nvCfXb0+uGcwhXEx+knCg+Iq7P5Rr/3nIrtvfAN80sw8BlwK/MrM/A9cDC+gI6VzXJDIuAg5IyvoI\n4VbWpf0sqwwhuoVSRCRiebXkzezI5N7dzvNPM7M/m9lCMzuv8MUTEZH+6DHkk6v6NxLujsieX0G4\nEHQi4dayT5hZ3QCUUURE+iiflvzzdN1PuT/hVrRNyf3Af6KLOytERKR0egx5d7+LcLW/szGEpw4z\nGoCxBSqXiIgUQH+GNdhECPqMWsJtbTml0+l0KtXVnV4iIv338NJXufq2JRw1fTIHTJ1Q6uIUzOnH\n7NOn4OxNyHfewd+AfZOnA5sIXTVX97iRVIo1axp6sdt41dXVqi4SqosOw6Uufr/4JR57ZlXOdSoq\nymhra+/VdptawmgSe+9ay8y37Nzn8sWiNyGfBjCzs4Aad7/JzC4h3CecAm5KHvEWEenRwuVv8Nra\nRqqruo+hshS09+Eu73GjR7DXrnqwF0pzn3x6OLRS8jFcWmz5UF10iKUu2tNpnntlI02tXV3Sg9vu\ne5atbe185zPHdLuNWOqiEOrqage8u0ZEJG9/f2kD3/zZspzrTBhTlXO59J9CXkQGRFNLaMEfst9E\npu3R9cCe++ymG/IGmkJeRPJy/+OvsOiZzuPHdS9zAXT/KeM58fA9BqpY0gOFvIjk5ZGnXufVNZup\nquxuEMw3GzOqkr0mjel5RRkwCnmRyKXTaV5YuYmm1t6+qGpHLVvaGFVVwXf/TQ+2DyUKeZHIPfdq\nPVf+5ImCbGvs6M7vZpHBTiEvErnG5tCCn77PTt1eAM3X3pPV9TLUKORFIjD/iVdZ+HTXF0UzIb//\nlPGc/PY9u1xH4qWQF4nAQ0++xsurNlNR3vWYgzUjK5iiJ0CHJYW8SAlsbt7KP97Y1OWyseubqa9v\n6tX2Wlq3MXJEOddfMrsQxZOIKORFSmDe3c/wzIvrC7rN2lGVPa8kw45CXqQEGpu3Ul6W4t1HT33T\nspqaKhobW3u9TV0Ula4o5EVKpLw8xamz9nrTfA3KJYWU14u8RURkaFJLXmQApNNpXl61efv4LZ01\nb9lW5BLJcKWQFxkAL77ewNdv+UvOdWpG6p+fDDz9lYkMgM3JA0gH7DWeabt3/ZSpLpRKMSjkZVhb\n9PTrPPJk4d9aubklE/ITeNeMKQXfvki+FPIyrM1fupIXVnb9UFJ/VVaUsefOowdk2yL5UsjLkNLc\n2sbLqwp3e2Fz6zbKy1Lc+MXjC7ZNkcFEIS9DxtqNzXzj9qWsrW8p6HarRuT/EgyRoUYhL0PC2vqO\ngD/qwF2ZMGZkwbY9VW8ukogp5GVA/flvq3ho2Ws9rjdiRDlbctw7/traRuobtzDnmKmcftSbhwIQ\nka4p5GVAzX9iJf7Kxn5vp7wsxRnH7t3lMAAi0j2FvHTrjfVNbG7q33tBm1vbALjxi8flXK9uYi1r\n1nZ/QTVFirKyVL/KIjIcKeSlS6+va+T/3bi4INsqL0tRXpZ7mKTy8rIe1xGR3lPIS5cakhb8vruN\nxfbs33tBp+yiNxKJlIpCXnawxFfz0LLXaEye2HzLlPGccezeJS6ViPSVQl528OATK/nbSxsAKEul\n2ENPbIoMaQr5yK3e2Nyri6dNyYXS718ym/LyVLcvhhaRoUEhH7FV65uYe8Njvf5eeVmKyooy3c0i\nEgGFfMQ2NW0BwpC20/bI/+LpnruMVsCLREIhPwzsP2U8Z87ep9TFEJESUIeriEjE1JKPyLr6lu1v\nJAJ4Y11TCUsjIoOBQj4Sqzc08aV5XV9kLVf/usiw1WPIm1kKuB6YDrQA57n7iqzlZwOXAG3Aze7+\ngwEqq+SwqTG04KdOqmW/rHeKlpenOHb65FIVS0RKLJ+W/Bygyt1nmdmRwDXJvIyrgf2BJuCvZna7\nu9cXvqiSjwP2mqCLrCKyXT4XXo8G7gVw98XA4Z2WPwmMB6qT6XTBSiciIv2ST0t+DJDdMm8zszJ3\nb0+mnwGWAJuBX7r7wLwVWXawoaF1h4usqzboIquIvFk+Ib8JyB5GcHvAm9lBwCnAFKAR+ImZnenu\nd+baYF2dRiXM6EtdvLGukc9fv5B0F+dMtaNHDtn6HarlHgiqiw6qi/7JJ+QXAqcCd5jZDGB51rJ6\nQl98q7unzWw1oesmpzV
"text/plain": [
"<matplotlib.figure.Figure at 0x7f97c0ef4390>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"y_pred_prob = model.predict_proba(X_test)[:,1]\n",
"fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)\n",
"plt.plot(fpr, tpr)\n",
"plt.xlim([0.0, 1.0])\n",
"plt.ylim([0.0, 1.0])\n",
"plt.title('ROC curve for Titanic')\n",
"plt.xlabel('False Positive Rate (1 - Recall)')\n",
"plt.xlabel('True Positive Rate (Sensitivity)')\n",
"plt.grid(True)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.74750054, 0.74312762, 0.74298741, 0.73808718, 0.73799308,\n",
" 0.73743733, 0.73736981, 0.73735128, 0.73729214, 0.73709628,\n",
" 0.73699794, 0.73675548, 0.73659304, 0.73639721, 0.73623377,\n",
" 0.73612635, 0.73607305, 0.73572436, 0.7356707 , 0.735536 ,\n",
" 0.73544523, 0.73407999, 0.73200457, 0.7316892 , 0.73139765,\n",
" 0.73080287, 0.20382799, 0.20324215, 0.20255542, 0.202325 ,\n",
" 0.19998395, 0.19993953, 0.19986688, 0.19983705, 0.19891076,\n",
" 0.19881374, 0.19872727, 0.19868889, 0.1986448 , 0.19860251,\n",
" 0.19851757, 0.19851517, 0.19851124, 0.19850688, 0.19843776,\n",
" 0.19841942, 0.19831147, 0.19830402, 0.19816605, 0.19815391,\n",
" 0.19813555, 0.19813539, 0.19803009, 0.19801409, 0.19800118,\n",
" 0.1978783 , 0.19785132, 0.19784528, 0.19783312, 0.19782026,\n",
" 0.19780287, 0.19776301, 0.19774832, 0.19770726, 0.19759125,\n",
" 0.19756794, 0.197232 , 0.19720558, 0.1971321 , 0.197085 ,\n",
" 0.19652697, 0.19651513, 0.19193059, 0.18794571])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Threshold used by the decision function, thresholds[0] is the number of \n",
"thresholds"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f97bea20710>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f97be73c8d0>], dtype=object)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEICAYAAABfz4NwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAElpJREFUeJzt3XmQpHV9x/H3zG4vxzDsLthiGQSPynxJyiOSRCMKu1Jo\nBDREUymq1JhQ8SCuGBWNsGqqrMQr0VXR8giHaJnLYNAoBdG4kcPyCvFgDX7BUiBLqDCBHWbYhd2d\nmc4f/QOGLWD6eJ7u6eX9qtqqebqf/vZnn+nffKbPGWu1WkiSND7sAJKklcFCkCQBFoIkqbAQJEmA\nhSBJKiwESRIAq4cdQI8sIk4F3gusAX4M/HFm3j3cVNLwRMRngOsyc8uws+xvvIewgkXEY4CLgJdm\n5q8AvwA+MNxU0nBExDER8Q3g94edZX9lIaxsLwS+l5k/L9ufBF4xxDzSMG2i/QvSF4YdZH9lIaxs\nTwD+e8n2dmAyIg4ZUh5paDLzrMz8W2Bs2Fn2VxbCyvZw35+FgaaQ9KhgIaxstwCPX7J9JLAjM+8Z\nUh5J+zELYWX7GvDsiHhK2X4d8OUh5pG0H7MQVrDMnAbOAL4YET8BngqcPdxU0tD5Ec01GfPjryVJ\n0OEb0yLi2cD7M/P5EfFrwHnAPLAbeFVmTkfEa4DXAnuB92TmZXWFliRVb9mHjCLibcD5wAHlpI8A\nmzLzROBS4O0RcQRwFvAc4EXA+yKiUU9kSVIdOnkO4WfAS5dsn56Z15WvVwP3As8CrsnM+cycBW4E\nnl5pUklSrZYthMy8lPbDQ/dt/y9ARBxH+52DHwYOBe5acrG7gbWVJpUk1aqnD7eLiNOBc4FTMvOO\niJilXQr3mQRmlpvTarVaY2O+6VCVG7kblWtBNer4htV1IUTEK2k/ebwxM+/7of894C8jYg1wEHAM\nsG3ZlGNjTE/PdRthWc3mZOVzR2VmXXNHLeuoebSvhbrmmrW79dBVIUTEOPBR4Gbg0ohoAVdm5rsj\n4jzgGtpttDkz93QzW5I0XB0VQmbeDBxXNg9/mH0uBC6sKJckacB8p7IkCbAQJEmFhSBJAiwESVLR\n0/sQVpq7Zme54htXMjbW7rdDJg7g7p27H3b/8bEWL3vJKYyP24eSdJ/9ohB+cdNNfOWHe1lz0Lol\npx708Be460ZOO2XBQpCkJfyJKEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQIs\nBElSYSFIkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIkwEKQJBWr\nO9kpIp4NvD8znx8RTwEuBhaBbZm5qezzGuC1wF7gPZl5WT2RJUl1WPYeQkS8DTgfOKCctAXYnJkb\ngPGIOC0ijgDOAp4DvAh4X0Q0asosSapBJw8Z/Qx46ZLtX8/Mq8vXlwMvAJ4FXJOZ85k5C9wIPL3S\npJKkWi1bCJl5KTC/5KSxJV/PAYcCk8BdS06/G1hbRUBJ0mB09BzCPhaXfD0JzACztIth39OX1WxO\n9hDhwdatO7ir/cdXjdFsTtJodPeoVhVZBzGzrrmjlHUUjdLxNetoZe1UL4XwnxFxQmZeBZwMbAW+\nD7wnItYABwHHANs6GTY9PddDhAebmdnV1f6LCy2mp+e6KoRmc7KSrHXPrGvuqGUdRaN0fM06Wlk7\n1UshvBU4vzxpfD1wSWa2IuI84BraDyltzsw9PcyWJA1JR4WQmTcDx5WvbwQ2PsQ+FwIXVhlOkjQ4\nvjFNkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJ\ngIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRYCJKk\nwkKQJAEWgiSpWN3LhSJiNfBZ4InAPPAaYAG4GFgEtmXmpmoiSpIGodd7CKcAqzLzucBfAO8FtgCb\nM3MDMB4Rp1WUUZI0AL0Wwg3A6ogYA9YCe4FjM/Pqcv7lwEkV5JMkDUhPDxkBdwNPAn4KHA68BDh+\nyflztItCkjQiei2ENwNXZOY7IuKXgG8Ca5acPwnMdDKo2ZzsMcID1q07uKv9x1eN0WxO0mg0urpc\nFVkHMbOuuaOUdRSN0vE162hl7VSvhXAn7YeJoP2DfzXwg4jYkJlXAicDWzsZND0912OEB8zM7Opq\n/8WFFtPTc10VQrM5WUnWumfWNXfUso6iUTq+Zh2trJ3qtRA+AlwUEVcBDeAc4FrggohoANcDl/Q4\nW5I0BD0VQmbuBE5/iLM29pVGkjQ0vjFNkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIK\nC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqVg97ACS\n9Gi2sLDA9u23MDs7wY4dO5fd/8gjj2LVqlW1ZLEQJGmItm+/hbO3XMaaicOX3XfPzjv40FtO5eij\nn1RLFgtBkoZszcThHHjoEcOO4XMIkqQ2C0GSBFgIkqTCQpAkARaCJKno+VVGEXEO8DtAA/gEcBVw\nMbAIbMvMTVUElCQNRk/3ECJiA/CczDwO2AgcBWwBNmfmBmA8Ik6rLKUkqXa9PmT028C2iPgS8C/A\nV4FjM/Pqcv7lwEkV5JMkDUivDxk9hva9ghcDT6ZdCkvLZQ5Y2180SdIg9VoIdwDXZ+Y8cENE3Asc\nueT8SWCmk0HN5mSPER6wbt3BXe0/vmqMZnOSRqPR1eWqyDqImXXNHaWso2iUjq9Zq5s7OzvR1f7r\n10/U9n/qtRCuAd4IfDgiHg9MAN+IiA2ZeSVwMrC1k0HT03M9RnjAzMyurvZfXGgxPT3XVSE0m5OV\nZK17Zl1zRy3rKBql42vW6uZ28oF2++7fzXV3sx56KoTMvCwijo+I7wFjwJ8ANwEXREQDuB64pJfZ\nkqTh6Pllp5l5zkOcvLH3KJKkYfKNaZIkwEKQJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GSVFgI\nkiTAQpAkFRaCJAmwECRJhYUgSQIsBElSYSFIkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwE\nSVJhIUiSAAtBklRYCJIkwEKQJBUWgiQJsBAkScXqfi4cEY8F/gM4CVgALgYWgW2ZuanvdJKkgen5\nHkJErAY+BewqJ20BNmfmBmA8Ik6rIJ8kaUD6ecjog8Angf8BxoBjM/Pqct7ltO81SJJGRE+FEBF/\nBNyemV+nXQb7zpoD1vYXTZI0SL0+h3AGsBgRLwCeAXwOaC45fxKY6WRQsznZY4QHrFt3cFf7j68a\no9mcpNFodHW5KrIOYmZdc0cp6ygapeNr1urmzs5OdLX/+vUTtf2feiqE8jwBABGxFTgT+OuIOCEz\nrwJOBrZ2Mmt6eq6XCA8yM7Nr+Z2WWFxoMT0911UhNJuTlWSte2Zdc0ct6ygapeNr1urm7tixs+v9\nu7nubtZDX68y2sdbgfMjogFcD1xS4WxJUs36LoTMPHHJ5sZ+50mShsM3pkmSAAtBklRYCJIkwEKQ\nJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQIsBElSYSFI\nkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklSsHnYASdWbn5/nne/7OIes\nfyx79swvu//uXbOc+4ZXsG7d+gGk00plIUj7oVarxe27DuL2xuM62n/vru3s3r275lRa6XoqhIhY\nDVwEPBFYA7wH+C/gYmAR2JaZm6qJKEkahF6fQ3gl8H+ZeQLwIuDjwBZgc2ZuAMYj4rSKMkqSBqDX\nQvgC8K7y9SpgHjg2M68up10OnNRnNknSAPX0kFFm7gKIiEngn4B3AB9cssscsLbvdJKkgen5SeWI\neALwz8DHM/MfIuKvlpw
"text/plain": [
"<matplotlib.figure.Figure at 0x7f97c0ef4c88>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Histogram of probability vs actual\n",
"dprob = pd.DataFrame(data = {'probability':y_pred_prob, 'actual':y_test})\n",
"dprob.probability.hist(by=dprob.actual, sharex=True, sharey=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ROC curve helps to select a threshold to balance sensitivity and recall."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Function to evaluate thresholds of the ROC curve\n",
"def evaluate_threshold(threshold):\n",
" print('Sensitivity:', tpr[thresholds > threshold][-1])\n",
" print('Recall:', 1 - fpr[thresholds > threshold][-1])"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sensitivity: 0.0786516853933\n",
"Recall: 0.992537313433\n"
]
}
],
"source": [
"evaluate_threshold(0.74)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sensitivity: 0.741573033708\n",
"Recall: 0.880597014925\n"
]
}
],
"source": [
"evaluate_threshold(0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, the thresdhold to decide a class is 0.5, If we modify it, we should use the new thresdhold.\n",
"\n",
"threshold = 0.8\n",
"\n",
"predicted = model.predict_proba(X) > threshold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"AUC is the percentage of the ROC plot underneath the curve. Represents the likelihood that the predictor assigns a higher predicted probability to the positive observation. A simple rule to evaluate a classifier based on this summary value is the following:\n",
"* .90-1 = very good (A)\n",
"* .80-.90 = good (B)\n",
"* .70-.80 = not so good (C)\n",
"* .60-.70 = poor (D)\n",
"* .50-.60 = fail (F)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.799890994466\n"
]
}
],
"source": [
"# AUX\n",
"print(roc_auc_score(expected, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and Evaluate with K-Fold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is alternative to splitting the dataset into train and test. It will run k times slower than the other method, but it will be more accurate."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Scores in every iteration [ 0.81564246 0.80337079 0.78089888 0.73595506 0.80337079]\n",
"Accuracy: 0.79 (+/- 0.06)\n"
]
}
],
"source": [
"# This step will take some time\n",
"# Cross-validation\n",
"cv = KFold(X.shape[0], n_folds=5, shuffle=False, random_state=33)\n",
"# StratifiedKFold has is a variation of k-fold which returns stratified folds:\n",
"# each set contains approximately the same percentage of samples of each target class as the complete set.\n",
"#cv = StratifiedKFold(y, n_folds=3, shuffle=False, random_state=33)\n",
"scores = cross_val_score(model, X, y, cv=cv)\n",
"print(\"Scores in every iteration\", scores)\n",
"print(\"Accuracy: %0.2f (+/- %0.2f)\" % (scores.mean(), scores.std() * 2))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We get 78% of success with K-Fold, quite good!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can plot the [learning curve](http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html). The traning scores decreases with the number of samples. The cross-validation reaches the training score at the end. It seems we will not get a better result with more samples."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'matplotlib.pyplot' from '/home/cif/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py'>"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY8AAAEZCAYAAABvpam5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XecXFX9//HXuXdmtmVLyqaHIO1AQhJKKCKCoIhIgKgo\n+AXpVbBhoYgK+gX9CV9EQg0dFFAEQUFAOooYekvChxJKIL3tZuuUe39/3Lu7s7uzZXZ3dmZ3P08f\nmJk79945meze95x6je/7KKWUUtlw8l0ApZRSQ4+Gh1JKqaxpeCillMqahodSSqmsaXgopZTKmoaH\nUkqprEXyXQA19FlrpwNvikh5Ht77QuAdEfnDYL93vlhrHwB+JCJvWWsfAb4pIhuste8DXxORl3s4\n/klggYjcGz6fAvwTeExEvpdh/5uBA4A14SYD+MCXRWRVN+/jAeNEZEOH7T8EdhSR43v5V1YFSMND\nDZS8TBgSkV/k433zSUTmpT09oD/nstZuCzwCXCMil3Sz62UiclmWp+/uZ0InmA1xGh4qp6y1UeD/\nAfsALvAK8F0RqbPWzgPOBaLAeOA2Efm5tXZf4PdAPVAKnA38HFgG7AjEgDNE5OnwW/EbInKZtbYR\n+A3BBXUScIWI/N5a6wCXAocAm4DngRkisl+G8p4LHAMkgHeA44GvAIeLyCHhPse2PA/ffwywFfAo\ncCKwrYisCfd9DrgAeKKrzyHtvccAHwHVItJorb0W2EFE9g1ffxs4DPgH8DXgzPDQJ621B4ePT7PW\n7gxUA38QkfO7+beZA/wdOFdE/tjVft2x1lYAVwE7AR7wcHg+j6CGgrU2AiwAvgCsJqjBbOrL+6nC\noX0eKtfOARIiMldEdgZWElxEAX4AHCMiuwOfBs4NL6AAM4EjwmOagd2BS0RkF+AmggtyR0XAGhHZ\nG/g68BtrbQw4GdgZmBG+z9Zk+OZrrT2UIDj2EJHZwPvAGeHLHfdPf14iIrNE5CzgXuDo8Hw7ABNF\n5JEePgcAwuad/wItofY5YFtrbam1dgYQF5Glafuf0LKfiHwcPm4Ukd2APYAfhk1SmewNPAks72Vw\nnGWtfdla+0r4Z8t7LwDWicgsYC4wB/hRh2PPALYBtge+CGzRi/dTBU5rHirX5gGV1tovhs+jBN8+\nAQ4F5llrjwJ2CLeVhX8uT7sgAnwoIm+Ej18Gju3i/f4GICIvh8FRBhxEUKtJAFhrrwO+k+HYzwN3\ni0hteI4fhft39V4t/p32+AbgauAy4DiCoIPuP4d09wEHWWvfAz4B3iAIkdnAPV28v0l7fEdY9tXW\n2tUENbpPMhxzFEEt5gpr7cUicl43fz/outnqS8Be4XsmwtrS94Df0hawnwfuEJEU0GCt/SMwq4f3\nUwVOw0Plmgt8L/z2jbW2FCgO/3yF4Jv6vwgusvNpuxDWdThPY9pjn/YXzK72I9wv2WH/VBfHJkmr\nUVhrK4GqDO8X63Bca1lF5FlrbcRauxvwP8Ce4UsZP4cMZfgr8AxBk9k/gY0E39Z3B07rotzptaBE\nh+1dfU7fF5F/WWu/ATxvrX1eRO6z1u5KEIAAfljT607H8zsEwdixfOn7JXs4pxoCNDzUQOnqIvUI\ncKa19gmCi/aNQC3Bt/Ny4HwRSVprjya4KLs5KNODwLestX8gaJc/jswdto8Bv7XWXhL2RVwQnuMR\nYMewJuMR9J1050aC5pzXRKTlW39Xn8Op6QeKyCfW2nXh9qMJ+gZ+BtSLyOsZ3itF5zDrjebw/d6x\n1p4K3Gqt3UNEXiJo4uutRwiapc6y1hYBpxCEHrR9/g8Dx4Q1DgMcAbzdhzKrAqLhoQZKqbW2Nnzc\nMpTz08CvCDqrXyH4Vvoq8EOCzvAHALHWbgTeBZYQtI3H+1iGrvolbgEsQXNXHUFfRkPHg0XkobCf\n4j/WWh9YTNBf0gQ8DQiwgqCvYHY35bgVuAg4Mm3br4BL6Pw5ZPJX4CwReQXAWttAUEPL9Pe8F/i3\ntXY+3ffLdLldRP5srf0scK+1dvf0TvwezgNBE9UCa+0bBDWOh4GLOxx3HcG/65vAOoJalRrijC7J\nroY7a+0BwPiWjmFr7eUEHcvn5rdkSg1dOQ0Pa60haJ6YQ/Dt7SQRWZb2+lHAWQRtoDeLyLXh9nMI\nOlOjwNUicnPOCqmGPWvtZILax3iC2varwOkisjmf5VJqKMt1s9V8oEhE9rLW7kEwAmV+2uuXEIyy\naQCWWGvvJBgv/unwmDK6rtor1SsisoKg01kpNUByPc9jb4I2UERkEcE48HSvAaOBkvC5DxwIvGmt\nvY9g2OUDOS6jUkqpLOU6PCqAmrTnyXC2b4vFwEsEY9kfCMfXjwN2BQ4HTicct66UUqpw5LrZqpZg\nOGYLJ1y2AGvtLOBgYDrByJs/WmsPB9YDS0UkCbxtrW2y1o4TkXVdvcnatZu1118ppbJUXV3e1RD7\nHuW65vEs8GUAa+2eBDWMFjUEfR3NIuITrHdTRTBb90vhMZMJ1jZan+NyKqWUysJgjbZqGRN/PEGT\nVJmI3BBOTjqBYMLSe8DJ4YSx3wD7E8wXOFdEHuvufbTmoZRS2etPzWNYzPPQ8FBKqewVcrOVUkqp\nYUjDQymlVNY0PJRSSmVNw0MppVTWNDyUUkplTZdkV2qEqn3+v2x48AHiK1cQmzSZMQfPo2L3PXs+\nsAtXXnk5IkvZsGE9TU1NTJkylaqq0fzyl7/u8dh33nmbZ599huOOOynj64sWPceaNas55JD5GV9X\ng0+H6io1AtU+/19WLby20/aJp5zWrwABeOihB/joow859dQzet5Z5VV/hupqzUOpYWjt3Xex+cUX\nunw9uWlTxu2rbryedffcnfG18rm7Uf31IzO+1p1XXnmJa65ZQCwW49BDv0IsFuPee+8mlUphjOHi\niy/hvffe5b777uHCCy/myCO/wuzZO/HRRx8yZsxYLrrotzz88IN8+OEHzJ//NS644KdMmDCBjz/+\nmB12mMmPfnQONTWbuPDC80kkEkybtgUvv/wid93119YyxONxfv7zc6ivr6epqYlTTvk2u+22Bw88\ncB/33Xcvnuex9977cMIJp/DPfz7E3XffSSxWxNSp0/jxj8/j0Ucf5sEH/4bv+5x44qnU1GziT3+6\nA9d1mT17pxEZlBoeSo1EqS5u497V9n5KJOIsXHgLALfffguXXPJ7ioqKuOSSi1m06DnGjavGmOBL\n8MqVK7jyyoWMG1fNt799EkuXLgZoff3jjz/i8suvJhaLccQR89m4cQN/+MMt7LPP55g//3BeeGER\nL7zwfLv3/+STj6mpqeH//m8BGzduYPnyj9i4cSN/+MNt3H77n4hGo1x33VWsWrWKm25ayC233Elx\ncTELFvyO+++/l9LSUsrLK/j1ry+ltraWb3/7JG688XaKior41a9+zosvPs/cubvn5LMrVBoeSg1D\n1V8/sttawge/OJ/4Jx932h6bOo0tL/jVgJdniy2mtz4ePbqKiy66gOLiYpYv/5Add2x/R9+qqirG\njasGoLp6PPF4+7sST5kyjeLiYgDGjh1Hc3OcDz74gIMOCm4tP2dO51uwf+pTW3HooV/hggvOI5lM\ncfjhR7BixSdsvfXWRKNRAE499QzeemsJn/rU1q3nnzNnZ154YREzZsxs/Tt88slyNm3ayI9//D18\n36exsZFPPvlYw0MpNfyNOXhexj6PMV8+OCfvZ0wwsLO+vo4bb1zIvfc+iO/7/OAH/Wvuaemz3Xrr\nrXnzzdfYZpttefPN1zvtt2zZuzQ0NPDb317O+vXrOP30E7n++lv58MMPSSaTRCIRzj//bM488wd8\n8MEympubKCoq5tVXX2LatC0AcJzg7zBp0hQmTJjI7353Fa7r8tBDD7DttrZff4+hSMNDqRGopVN8\nwz8ebBtt9eWD+91Z3pOyslHMnj2HU045jkjEpby8knXr1jJx4qS0vdr6cFuaqtKlb2t5fNRRx/Kr\nX/2cJ598nLFjxxGJuO2
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65d42278>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_learning_curve(model, \"Learning curve with K-Fold\", X, y, cv=cv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and Optimize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section we are going to provide an alternative version of the previous one with optimization"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.811659192825\n"
]
}
],
"source": [
"#Tune parameters\n",
"gammas = np.logspace(-6, -1, 10)\n",
"gs = GridSearchCV(model, param_grid=dict(gamma=gammas))\n",
"gs.fit(X_train, y_train)\n",
"scores = gs.score(X_test, y_test)\n",
"print(scores)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'matplotlib.pyplot' from '/home/cif/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py'>"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY8AAAEZCAYAAABvpam5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XecHHX9+PHXzOzutVxJueTSkfaBhCSUIAgIgiIiXUHg\nC0rviIqFqoJ+we9PEJFOaAGUIl1AQkcEMRBqSMKbEkIS0ttdrm6Z+f0xs3d7d3vJ7d3u7d7d++kD\ns1P3vXt3855PHcvzPJRSSqlM2PkOQCmlVP+jyUMppVTGNHkopZTKmCYPpZRSGdPkoZRSKmOaPJRS\nSmUslO8A1OBmjPk18J6IPGmMuRz4RET+msHx7wDfEJG6LMTyJPCQiNzTg2PfAb4RLD4mIt8M1rvA\nCBFZ141z7A1cCGwJuEATcI2I/K2L/dN+X8aY4cBqEbGD5YOAS4AS/L/5ecDPReTLTD9npjL5/Kp/\n0eSh8m0//IsZIvLbTA8WkZ2zHlEPJOMwxmwB7JqyqVsDqYwxBwK3AkeLyBvBugnAc8aYBhF5PM17\ndvV9Wcn3NcaMBmYCO4nI0mDdxcCDwF7dia2XdCDZAKXJQ2WNMeZ04MdAHFgJnCsinxpj7sK/iGwP\njACeA34CnA5MB64yxiSAw4G5InKNMaYJ+DNwMFAO/Ao4CpgCLAMOFpGm5J0tcD5wUPA+NjAVOEVE\n7jLGnAycjX9RXQv8WEQkuLDeDYwGFgMj03ymqcBTIjIhWJ4FrBSRE4wxkSCWrYD1QRx3AqVBSWR6\n8J6/M8bsDgwDrhaRm9J8ff8P+EkycQCIyGJjzClAWfDedwXn2BJ4CqhJ+b6+B/wv0ADMSTnvCCAM\nVKSsuxZ4N+UzdvX9bAPcGLz/GOA9/OQWNcY0A08E3/NxwXv8BSgFosAvROSVDD6/6me0zUNlhTFm\nP+AXwD4ishNwP/7FJWkqfiljEjAZOD24iMzBv9A80eGURcCXIjIVuBm4DThPRLYHKoHDgv08ABG5\nVER2CkoAs4B/ADONMfsAJwB7icguwFXAo8GxNwFviMgU4Dxgu46fS0Q+AKLGmEnGmGLA0FY99S3g\nvyJSm3LISUCjiOwsIm6w7lMRmQ58D/iTMcbp8N1VAjvgJ9WO7/+6iKSuLxGRKSJyUcrxI4E7gCNE\nZFfgi5Tj5wbf3bvGmA+NMTOAQ5LvFVSVdfX9nAbMFJE9gW3wk9ZBwbYI8ETw8/gAeAy4LPh5nQ78\nxRhjdefzq/5Jk4fKlgOAB5N12yJyNzDGGDMx2D5TRJpEJAbcE+yfZJFe8iL2Gf4d9opg+XP8u9hO\nxxpjzsNPUseKiAd8F79k8B9jzLvAH4EqY8xQ4Jv4VTqIyGfAS13E8Vhwnn2CfVYZYyYBhwKPdHFM\nqvuD93gP/6Jb0WF7spqptYrHGPOAMeZdY8w8Y0xqXK+lOf9ewAciIsHyrakbReSX+KWUXwON+N/B\nK8HF/SDSfz9VwAXAGmPML/ET+GhgSJpYpgBxEZkVvN87IjIt+P678/lVP6TVVipb0t2I2PjVGeBX\nZaWuT3TjnC0pr2Ob29kYcxR+CeJrItIUrHaAezvcqY8RkfVBlVdq8kmNMdVj+FVCo/Hv2FfiJ7/v\nABd343N0jL1dwhORDcaYBcC+wNPBumOCWPcBrk/ZvT7N+ZNVdUmt360x5hBguIjMDD7HY8aYS4Al\nwE50/f1sMMY8GJz37/jVZBM6xJ6MJU6Htg1jzGTgo+58ftU/aclDZcuzwNHGmBEAxpiTgDUi8mmw\n/WhjTCSo+jkBv1oJ/AtPuNPZMhRcZP+C3xayOmXTc8CxxpiaYL+zgReDbbPwq1iSjdP7dnH6N4Ct\n8dtfXgCeB34KfJymF1Ec/4Lcla4unD/Hr+rZPeUzlQfvublE+29gkjFmSrB8Ysq2jcCVxpjtU9Zt\nid+T6zM2/f18G/idiDwUxL1bF59NAM8Yk+xhtnNwjnTXF00cA4SWPFRWiMgLxpg/Ay8F1SGraasf\nB7+65N9AFX532JnB+ieBq4PG59S710310km33634F9l7jTGhYP0/ROQyY8z/A54PGuXrgCOCY84F\n7jLGzAOWktKI3OGzecaYfwK7iMhaY8xrwFDg4TRxLMdvX5iPX53U8XOk/Vwi8qwx5ljgkqDHlof/\n9/k8fgJJd2yyvWeNMeZ/gPuMMS3Av1LO+4ox5lzg7qBtJR7EeFjQVvPcJr6fi4DHjTFr8X9+r+An\n0XaxBA3o38NPflfjlxiPEJGYMaZbn1/1P5ZOya5yLeglNFdErsl3LEqp7MhpySO4A70JmAY0A6eK\nyMKU7cfhd7GMA3eJyC3B+gvxGyPDwE0iclcu41Q5p3coSg0wua62OhwoEpE9jDG7AdcE65Kuwu/7\n3wjMN8bcD+yI3+C5hzGmDL8uWPVjInJyvmNQSmVXrhvM98JvlEREZuMPmkr1Pn7dcUmw7OH3YvnQ\nGPM4fqPqUzmOUSmlVIZynTwqgNQBVHFjTOp7zgPeBubij+Ktwx8RuwtwJHAWcF+OY1RKKZWhXFdb\n1eFPLZFkJ0fdBt0KDwIm4k+p8DdjzJH40yMsEJE48LExptkYM0JE1nT1JqtXb9Q6daWUylB1dXmP\nu07nuuTxOv7IXIL+63NTttXit3W0BCNRV+F343wNf/AVxpgx+HPlrM1xnEoppTKQ0666Kb2tpgar\nTsKvkioTkduNMWcAJ+P3C/8MOE1E4saY/8OfYsICLhKRFzb1PlryUEqpzPWm5DEgxnlo8lBKqcwV\ncrWVUkqpAUiTh1JKqYxp8lBKKZUxTR5KKaUypslDKaVUxnRKdqUGqbo3/8u6p58iunwZkdFjGHbQ\nwVR8dffNH9iFG264FpEFrFu3lubmZsaOHUdV1VB+97s/bPbYTz75mNdff5UTTzw17fbZs99g1aqV\nHHLI4Wm3q76nXXWVGoTq3vwvK2bc0ml9zeln9iqBADzzzFMsXvwFZ5xxTq/Oo3KvN111teSh1AC0\n+qEH2DjnrS63xzdsSLt+xR23seaRh9JuK5++K9VHHZNxLO+++zY333w9kUiEQw89gkgkwqOPPkQi\nkcCyLK688io+++xTHn/8ES6//EqOOeYIpk7dkcWLv2DYsOFcccUfmTXrab74YhGHH/59LrvsEkaN\nGsXSpUvZfvvJ/OIXF1Jbu4HLL7+UWCzG+PETeOedOTzwwGOtMUSjUX7zmwtpaGigubmZ008/m113\n3Y2nnnqcxx9/FNd12WuvvTn55NN57rlneOih+4lEihg3bjy//OXFPP/8LJ5++h94nscpp5xBbe0G\nHnzwPhzHYerUHQdlotTkodRglOjiybZdre+lWCzKjBkzAbj33plcddVfKCoq4qqrrmT27DcYMaIa\ny/JvgpcvX8YNN8xgxIhqzj77VBYsmAfQun3p0sVce+1NRCIRjj76cNavX8df/zqTvff+BocffiRv\nvTWbt956s937f/nlUmpra/nTn65n/fp1LFmymPXr1/PXv97Dvfc+SDgc5tZbb2TFihXceecMZs68\nn+LiYq6//s888cSjlJaWUl5ewR/+cDV1dXWcffap3HHHvRQVFfH73/+GOXPeZPr0r+bkuytUmjyU\nGoCqjzpmk6WERb+9lOiXSzutj4wbzxaX/T7r8UyYMLH19dChVVxxxWUUFxezZMkX7LDD1Hb7VlVV\nMWJENQDV1SOJRqPtto8dO57i4mIAhg8fQUtLlEWLFnHggYcAMG3aTp3e/ytf2ZJDDz2Cyy67mHg8\nwZFHHs2yZV+y1VZbEQ6HATjjjHP46KP5fOUrW7Wef9q0nXjrrdlMmjS59TN8+eUSNmxYzy9/+RM8\nz6OpqYkvv1yqyUMpNfANO+jgtG0ew757UJq9e8+y/I6dDQ313HHHDB599Gk8z+NnP+tddU+yzXar\nrbbiww/fZ+utt+HDDz/otN/ChZ/S2NjIH/94LWvXruGss07httvu5osvviAejxMKhbj00gs499yf\nsWjRQlpamikqKua9995
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65a12b00>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Refine model\n",
"model = SVC(kernel='linear', gamma=gs.best_estimator_.gamma)\n",
"plot_learning_curve(model, \"optimized with GridSearch\", X, y, cv=cv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualise"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEKCAYAAAAPVd6lAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFVxJREFUeJzt3X+QXfV53/H3Xe1qJaSVZMkSGNqSuJhH2PxYQNQgy/ww\niHj4MaM4Y3tUuzjEBMeeuJ5iZ1x5hgljz3RwiclguzR21CFNmIYQN0DjYKzGLi0WCRA8slUKDxAN\n7RiDV8hIK6wfu8ve/nF38WX33t276N67+or36x/xPd/nnvOcu8tnzz333Hsq1WoVSVK5eua7AUnS\nkTHIJalwBrkkFc4gl6TCGeSSVDiDXJIK1zvfDUidEBEXAV8H/ivwbGbeOc8tSR1jkOtYVs3Mm+a7\nCanTDHIdyyoRcQewMzNvjYiDwM3ARuBtwFcz8zaAiPgt4FNABdgDfDozMyLeAfwHYAlwIrAD+HBm\njkTEIeA+4EzgI5n5wy7vnwR4jlzHvvqPLvcDQ5m5AfggcHNELJw4DfMxYENmngvcAvzVxGN+G/iT\nzHwP8A7g7cCVE3MLgfsy8zRDXPPJI3K92fw3gMz8YUQspHakfQXwz4GHI6IyUbciIlYAnwc2RsTv\nAadSO5JfWre+H3Stc6kJg1xvNgenjCvAAuDPMnPL5MKIODEz90bEX1B75Xo38G3gn008ZtIrHe5X\nmpWnVvRmNhnI24DNEXECQER8CvjexNzlwBcz8y8n6t9NLfilo4ZH5DqWTf1qz4bjzNwWEV8G/ntE\nvAoMA78+UbMFuDci9gAHgAeBU5qsT5oXFb/GVpLK5qkVSSqcQS5JhTPIJalwBrkkFa7rV63s3r3f\nd1claY5Wrx6oNJvziFySCmeQS1LhDHJJKpxBLkmFM8glqXAGuSQVziCXpMIZ5JJUOINckgpX5PeR\nX3XVZfzkJz+Ztvzzn7+RzZs/Mg8dSXqz27NnD5de+p5py3t7+/iHf9jZ0W23FOQR8W7g5sy8ZMry\nq4EbgVHgjszc2v4WX++SS9bz8ss/Z8eOp163/N57/4qbbvoCgGEuqasmQ3zt2rXcdde9r5sbHFzL\n4ODaaZnVTrMG+cRNZ/8VU+5NGBG9wK3AudTug7g9Iu7LzN2daHTSZIiv/dW1rF1QYclYhb19VZ5f\nAjfd9O+46aYvGOSSuurSS9/DihUruOuue9l46jtZNQpjFXi+f5wdTz7F4OBaHn/8Uc499190ZPut\nnCN/ll/e9qreacAzmTmcmaPU7iZ+YTubm+r97//lC4LTeyqcPryAtx9YwOnDPZz0C9i06QOd3Lwk\nNfXgg3/P++I03rW/wjt+sYDTXlnAKQd6GBxcy7p15/Hxj1/TsW3PekSemfdExMkNppYB++rG+4Hl\n7WqskRdffIFzz13Hhz70IZaNVqhM3Dt3YbWH5aNVblxzBhxP7V9J6paJ3Fm2rMLAq788Pl4z0kPv\n6Dhbt/4Zg4NrO7b5I7lqZZhamE8aAPYeWTszW7iwn507f8zdd9/NobrOq1Q52Ou340qaXwd6YLQy\n/tp4eME4Y33wrW/d3dHtzuWqlanfhfskcEpErKB2d/ELgVva1Vgjjz76o9f+qv20bxyOq9I/XmF/\nX5UnR6uc3smNS9IMHusbZWgvPLGswltHqoxV4MV+2LGjdo786qs3dWzbcwnyKkBEbAaWZObWiLgB\n2EYt5Ldm5gsd6PF1KpUK55zzTn74TO0d4ME1a7n6UB+nAX99/CiLDne6A0l6vTN+BjtPgPUvwqrh\nXv6aUXYM1TJq48baW4df+tLNHdt+pVrt7imJdtwhaHDwNKBKX99C+l8e4ZVFQA/0jcD7f9535E1K\n0hw91jfKi6uAKiw5CAtOXM7wcO1txHZcejjTHYKKDPJJF1xwNgf3H2TRCGzca4BLmn8PLx5lzwD0\nLOzhrrvu5dRTT23Leo/ZIAevUJF0dPrSUHs/zek9OyXpGGaQS1LhDHJJKpxBLkmFM8glqXAGuSQV\nziCXpMIZ5JJUOINckgpnkEtS4QxySSqcQS5JhTPIJalwBrkkFc4gl6TCGeSSVDiDXJIKZ5BLUuEM\nckkqnEEuSYUzyCWpcAa5JBXOIJekwhnkklQ4g1ySCmeQS1LhDHJJKpxBLkmF652tICIqwO3AWcAh\n4LrM3FU3/xHgBmAMuCMz/6hDvUqSGmjliHwT0J+Z64EtwK1T5m8B3gdsAD4bEcvb26IkaSatBPkG\n4AGAzHwEWDdl/kfAW4DFE+Nq27qTJM2qlSBfBuyrG49FRP3jngAeB3YC387M4Tb2J0maRStBPgwM\n1D8mM8cBIuIM4ErgZOBXgOMj4jfa3aQkqblWgnw7cAVARJxP7ch70j7gAHA4M6vAELXTLJKkLpn1\nqhXgHmBjRGyfGF8bEZuBJZm5NSK+CfwgIg4D/wj8SWdalSQ1UqlWu/ve5O7d+9u6wRvXnNHO1UlS\nW3xpaOfsRXOwevVApdmcHwiSpMIZ5JJUOINckgpnkEtS4Vq5akVqqjrDB3krTH9vphv1zR7RM0/1\n4w1qK3OsB1jQpvpXG7xlVqnOrR6gtzo/9WON+u9wPUBfk/rRJuvvpuKDfLi3ykOrx6YtXzoKF73U\n17D+fzWoHxiDi3bPT/3/bFC/bI71A2NwcbP6NQ3qR9tTv78X6623vkF9NxUf5AuqsLzBn8Tjpj+3\nr9Wv6HT9yPT6Ja82r1/ZoP64dtYfnlv9qvmob/J89lbhrQ3qF8+xfq7rn7H+UOv721uF1Q3qFzc5\nDG1X/XEz1K+Z4/rbUT9TP8fPcf1zrj945PUz9X/CHNbfKV5HLkkd4HXkkqSWGeSSVDiDXJIKZ5BL\nUuEMckkqnEEuSYUzyCWpcAa5JBXOIJekwhnkklQ4g1ySCmeQS1LhDHJJKpxBLkmFM8glqXAGuSQV\nziCXpMIZ5JJUOINckgpnkEtS4QxySSqcQS5JhTPIJalwvbMVREQFuB04CzgEXJeZu+rmzwO+MjF8\nEfhoZo50oFdJUgOtHJFvAvozcz2wBbh1yvw3gd/MzAuBB4CT29uiJGkmrQT5BmoBTWY+AqybnIiI\nU4E9wA0R8SCwMjOf6UCfkqQmWgnyZcC+uvFYREw+7q3ABcBXgcuAyyLi4rZ2KEmaUStBPgwM1D8m\nM8cn/nsP8GxmPp2ZY9SO3NdNXYEkqXNaCfLtwBUAEXE+sLNubhewNCLePjF+L/BEWzuUJM1o1qtW\ngHuAjRGxfWJ8bURsBpZk5taI+Djw5xEB8HBmfqdDvUqSGpg1yDOzCnxyyuKn6+YfBN7d3rYkSa3y\nA0GSVDiDXJIKZ5BLUuEMckkqnEEuSYUzyCWpcAa5JBXOIJekwhnkklQ4g1ySCmeQS1LhDHJJKpxB\nLkmFM8glqXCtfB/5Ue25415lT3+14VzjpfPrTdlTpdMbeJM+r3P0hvrpws+uVUfb8wnNezru1a62\nUX6QP3bCCSw69NP5bkOSXjO0YGlXt1d8kL+ydC2vLD21A2s+ig5F3pCjrf/O9zPnI7Y5t3S0Padw\ndPY0V0fhPlSOwp5mUHyQV3v65rsFSZpXvtkpSYUzyCWpcAa5JBXOIJekwhnkklQ4g1ySCmeQS1Lh\nDHJJKpxBLkmFM8glqXAGuSQVziCXpMIZ5JJUuFm//TAiKsDtwFnAIeC6zNzVoO4bwJ7M/ELbu5Qk\nNdXKEfkmoD8z1wNbgFunFkTEJ4DT29ybJKkFrQT5BuABgMx8BFhXPxkRFwDnAd9oe3eSpFm1EuTL\ngH1147GI6AGIiBOA3wd+l6PyNh+SdOxr5Q5Bw8BA3bgnM8cn/vuDwCrgfuBtwOKIeCoz/7S9bUqS\nmmklyLcDVwHfiojzgZ2TE5n5NeBrABHxMSAMcUnqrlaC/B5gY0RsnxhfGxGbgSWZubVzrUmSWjFr\nkGdmFfjklMVPN6j7z+1qSpLUOj8QJEmFM8glqXAGuSQVziCXpMIZ5JJUOINckgpnkEtS4QxySSqc\nQS5JhTPIJalwBrkkFc4
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65c13588>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65d21e10>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEKCAYAAAAPVd6lAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1wHPd93/H37t4jDgeABEFSomTaEqUf9QxJVPRo+UGW\nkrHlGSWZJFXjOlHj2kkmqad2Mqk848aTzHTcOlZGiceNXXacJp7G4/FUTuvYspq4mtq0LcWSKdF6\n+Iky9UhRfACJhwPuaXd//WMBGiQBAiTvAXv3ec1gcHt7uP0ucPfB73772996zjlERCS9/G4XICIi\n50ZBLiKScgpyEZGUU5CLiKScglxEJOUU5CIiKZfpdgEi3WSMeQfwOWvtVUusuwD4FhACv22tfazT\n9YmshoJc+p0HLHcyxbuBA9bauzpYj8gZ83RCkPSb+Vb4g0AFGAQ2AruAbcAx4CPAFuC/A0PAk9ba\nO7pTrcjK1Ecu/eoK4F8AHwXGgD+z1l4L/B3wt9baR4H/AHxXIS5rnYJc+tVr1trX528/vaj/+6+B\nHcaYcnfKEjlzCnLpV5VFt6NFtz0gBpqdLUfk7CnIRWDcGHP1/O2PAN+z1ta6WZDImdCoFRF4Fvhj\nY8zFwEHgN7pcj8gZ0agVEZGUU9eKiEjKKchFRFJOQS4iknIKchGRlOv4qJXDh2d0dFVE5AyNjZW9\n5dapRS4iknIKchGRlFOQi4iknIJcRCTlFOQiIimnIBcRSTkFuYhIyinIRURSTkEuIpJyqZyP/O67\n38Prr79+yv1/9Eef5N57f70LFYlIv5uYmOCOO2495f5MJsuPfrSnrdteVZAbY24EPm2tfddJ978f\n+CTJZbG+ZK3d2foST/Sud93CsWNH2b37+RPu//rX/yef+tQnABTmItJRCyG+fft2vvKVr5+wbnx8\nO+Pj20/JrFZaMciNMX8I/CtOvMYhxpgM8ABwPVAFdhlj/t5ae7gdhS5YCPHtb9vO9sCjFHpMZh37\nS/CpT/1HPvWpTyjIRaSj7rjjVkZGRvjKV77OnZdezmgTQg/252N2P/c84+PbeeKJx7n++p9ry/ZX\n00f+IvCLS9x/GbDXWjttrW0C3wNub2VxJ/uFX/jZB4IrfY8rpwMumgu4ctpnyyzcc88vtXPzIiLL\nevTRH/JucxlXzHhcMhtwWSVg25zP+Ph2duy4gd/6rQ+2bdsrtsittQ8ZY7YusWoImFq0PAMMt6qw\npbz55gGuv34Hv/qrv8pQ08MjmQws53yGm45PbrwKNpF8l5abzjgms46xukcxXnYitjWv7jveLDge\nfdvPEWWHul1OSwRhhWxzkkZuPXEw0O1y+s56YMtdH+WGyKMc/ax9vLHhk2nG7Nz5t4yPb2/b9s9l\n1Mo0SZgvKAOT51bO6eVyefbseZqvfvWr1BZV7nBUM5odt90OFmKeWhcxnU3373om43h6JCJfP9jt\nUlom2zhKeeYZss2plR8sbTPnQ9OLjy9PBzFhFr72ta+2dbtnMmrl5CbYc8A2Y8wIMEfSrfKZVhW2\nlMcff+r4f7U3sjEMOPKxx0zW8VzTcWU7Ny5Ug+R7MUpvaxygMd8IiP1cdwtpIT9uAL21T6lz5Mc8\nMxCTjz02NByhB2/mYffupI/8/e+/p22bPpMgdwDGmHuBkrV2pzHmY8AjJCG/01p7oA01nsDzPK67\n7nKe3JscAR7fuJ3317JcBvzvTU0K9XZX0L9m5z/1DERdLuQc1YJkP2I/3+VKWsePkxd+L+1TmhwN\nLmc0epaJYAdPXjQKh77N/vlRKnfemRw6/NM//XTbtu8519mPya24QtD4+GWAI5vNkT/WoFIAfMg2\n4BeOZs+9SFnS/9nUBODOg+n+HT87FPHTwZhj624kzI50u5yWGJp8knzjMEc2vBvnp/vvk1pHfsxo\nfAgHVBng/KEs09NJV1crhh6e7gpBqTwhaPfu5wC4+eZrqRSg0IA7J/XibaeG56gFsLGW7m4VgOrx\nFnmhy5W0jh/Xcfg4L5Vv6d6w4VomgMyhxygzSaXi89Wv/i8uvfTStm861X/1H/zgxxqh0iHOg20z\nPqUw/UE+G4DD76luiEZ+I83sCHjp//ukXbjxRo4B+x95sGPbTHWQS+fkY4/LZoJul3HOHI65jCMK\nBnoq9OZKF3e7BOkiTZolfaXhQ9MnCXKRHqEgl75SmR95E2VKXa5EpHUU5NJXpueDPAzKXa5EpHUU\n5NJXFs5KDTODXa5EpHUU5HJaEY7vbgh5qZTys4DmTWXBdxD1UJDn6gcpzL2CF4fdLkW6REEup3Wo\n4JjMOebSP2CFCMd01jHU9MDrnZd+ofo65crzzJ98LX2od17N0havDiQTAF04l/6XylTW4Tw4MHxh\nt0tpqSCqEnsZndHZx9L/7pS2mQsch/KOkYbHUA+cCHQsl7RYmz1yWj4AzhFEVQ2n7HMKclnWS6UY\nPHjrbG+8TCbyC0G+rsuVtI4f1/GIFeR9rjfeodJyDkcl48hHsKWa/tZ4jGMi54iCInHQO3OsBNEc\noBOc+p1O0ZcleXjceDRD3Xf4p0xFnz6TWUfoQyM72u1SWiry88wOXEwzt77bpUgXKcjltPIpvqTb\nYocLSbdKI9dbQR5nSswNbut2GdJl6lqRvnAw73B4NHssyEVAQS59oOo7pnKOZnadhuhJT1KQy3EO\nR9XvvZNK3iwkY+Hr+Y1drkSkPRTkctz+ouM7m0LeKMQrPzhF3ig6cNDIb+p2KSJtoSAXAGq+4yfD\nEZ6DkWZvHOCEpFtlIu9o5Nb11LBDgFz9MIMzzxKElW6XIl2mIBccjqdGIpo+XDbtMxD1TpDvH4jx\ngHrhvG6X0nLZxhGK1dfwnCbL6ncKcuGlUsyhgmOs5vHWHphTZYHD8VoxxuFTz2/udjktlwlncEAY\n9M5MjnJ2euddK2cl9BwvDMbkIhifDPB64OSfBceyjko2OcjZc6NVnCMTzhAFJfB1Oki/0yugz2Wc\nx60TGRq+o9AjJ/8seKWUHLStFS/ociWt50dz+C6kkdnQ7VJkDVCQC+XQgx5qiQPUfccbRUcYDNDM\n9t7p69lwGoAwO9zlSmQtUJBLT3plICb2oFp8C3i99U8KoJFbz/TQNTQzQ90uRdYABbn0nAjHS6WY\n2MtQL2zpdjlt4fw89ULvHcCVs6ODnX3E4XhmKGJfj1x/czmvDcQ0AqgVL8TpQKD0AQV5n3A4nhuK\n2TcY88pATOj13qn4kMw7/uJgjO9gbmBrt8sR6Qg1V/qAw/GT4ZiXSzGDTbh5IkPG9V6/McBrA45q\nBuaKb8H5+W6XI9IRKwa5McYDPg9cA9SAD1lr9y1a/+vAx4AQ+JK19q/aVKuchRjH7pGI/QOO8nyI\n98oc4yeLcLxQjnD4VAcu6nY57eNcTx7AlbO3mq6Ve4C8tfYW4H7ggZPWfwZ4N3Ab8HFjjMZDrSF1\nH47kHesaHrcc6d0QB3i5FFMLoDrwFuKgR1vjLmb9xKMMTj/T7UpkDVlNkN8GPAxgrX0M2HHS+qeA\ndUBxfrk3O19TqhgnAX7TRECuR7tTIBk3/kI5Gaky18Ot8Uw4TRA31CKXE6wmyIeAqUXLoTFm8c89\nAzwB7AG+Ya2dbmF90gKDkdezfeILbDkm9GGudHHvnY6/SK4xAUCjB09ykrO3miCfBsqLf8ZaGwMY\nY64C3gdsBd4KbDLG/HKri5TViXHEffiBaCrjeGUgOZBbLb6l2+W0Va5xBAe6ZJ2cYDVBvgt4L4Ax\n5iaSlveCKWAOqFtrHXCIpJtFOmw2cOzaEGHLvXVRiJU4HE+PRODB62M7wOvdEbVe3CTTnCLMjPT0\npw45c6sZfvgQcKcxZtf88n3GmHuBkrV2pzHmi8D3jDF14KfAX7enVFmKw/HqQHKiT+RDKUzu66VZ\nDE/n5VLMZM5Ry2/u+VZqEFZwXkAjr4my5ESec539KH748ExLN/jJjVe18ulSZS5wPDUccaTgyMRw\n1VTABdXebZGebC5wPDo
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65bc4588>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEKCAYAAAAPVd6lAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9wJOdd5/F39/T0jH5rpdWu7Y1jOxg/68TGsr0+ex0n\nseMf4RKnboECypALmBgHKCB1SSjOqUqRgjoqXIgpJyEkuaV8x+WOHJXDoYDguCAXSDbEBodNtoz9\nOMExwevdlVa7+q2Z6R/P/dEj7aw0Wmm9MyO19HlVqUbdPZr5tjT6zDNPP92P55xDRETyy9/oAkRE\n5MIoyEVEck5BLiKScwpyEZGcU5CLiOScglxEJOcU5CJNGGNSY8zQRtchsh4KcpHmdIKF5Eaw0QWI\ntJsx5k3AR4CjwGuAeeBngWPA7wOjQAo8DjxkrU0bfvYJ4E+stQfryx8Ahq217+vkPoici1rksl1c\nD3zEWnsd8CjwWeBjwElr7bXAPuA64P31+3v1298Hfh7AGOMBDwB/0MG6RdakIJft4lvW2q/Xv3+U\nLNjvAz4BYK2NgE8B/37Zz/05sNsYcy3wFuAFa+13O1OyyPooyGW7iBu+9xq+GvlAsXFFvZvlU8C7\ngJ+rfy+yqSjIZbu43hhzTf37dwNfA/4P8MsAxpgS8CDwRJOf/UPgR4AbgMfaX6rI+dHBTtkujgP/\nxRhzBXAC+I/ALPBxY8wRspb448Bv1++/NGrFWjtujPlH4J+ttUlnyxZZm4Jctospa+1/aLL+p5vd\n2VpbWPzeGLOTbGTLr7apNpELoq4VkXMwxjwAPAM8Yq09utH1iDTjaWIJEZF8U4tcRCTnFOQiIjnX\n8YOd4+Mz6ssRETlPIyN9y897WKIWuYhIzinIRURyTkEuIpJzCnIRkZxTkIuI5JyCXEQk5xTkIiI5\npyAXEck5BbmISM7l8jK29957Fy+99NKK9b/+6x/kvvuaXpVURKStJiYmuPPO169YHwRF/vEfj7T1\nudcV5MaYm4EPW2vvWLb+7cAHgQh4dHGm8Xa6445bOX36FIcPP3fW+i984U/50Ic+AKAwF5GOWgzx\nvXv38rnPfeGsbaOjexkd3bsis1ppzSA3xvwaZ2ZTaVwfAA8DNwILwCFjzJ9Za8fbUeiixRDfe8Ve\n9hY8emKPyaLjaA986EO/zYc+9AEFuYh01J13vp7BwUE+97kvcPdVr2U4gtiDo6WUw88+x+joXp5+\n+iluvPHfteX519NH/l2y+QqXuxr4jrV2uj4D+deAN7ayuOV++IfPfCC4xve4ZrrAa+YLXDPts2cO\nDhz40XY+vVyg48ePEUXRRpex6S0sLFCpVDa6jI5xzrGwsLDRZVywr3zlG7zZXM3rZjx+cK7A1bMF\nrpz3GR3dy759N/Gud72zbc+9ZovcWvuYMeayJpv6gamG5RlgoFWFNXP8+DFuvHEfP/ETP0F/5OHV\nJ0EPnc9A5PjgrmthN9mtbKjYc0wVHaeLjsnQcTp0VArwhvECg5GOsS8Xe46xkuPQxSOUquPM9u2l\n0vWqjS6rvZwjrI3TPf89Ur/E9MDoRlf0ig0Be+55DzclHn3Jmdf3rppPEKUcPPg/GR3d27bnv5CD\nndNkYb6oD5i8sHLOLQxLHDnybZ566tv81OWvXVrvcCwEujruZvL0joSx8pm/SSmBixY8YNUrcW47\nsec4UXIc60oZKzkSH8rV48SFbtxW/j25lFLlON3z3yNIsh7bargLXApevt/k532IvJSiy/ZjupAS\nF+Hzn/+Ttj7v+QT58lfWs8CVxphBYJ6sW+UjrSqsmaee+tbSu9rLxRS6HaXUY6boeDZyXNPOJxcg\ne9OcL8BkvaU9UvXYVV35z3fJgk9v7NhR8xiMPLoSlj5BbWeR5zhedhwrp4yXHWn9VxIXuqmWLqJa\n2k0S9IG3RX9XzrHj1N8TJLM4PCqli5nvuSLb57w7+U88051SSj121hyxB8dLcPhw1kf+9rcfaNtT\nn0+QOwBjzH1Aj7X2oDHmvcATZCF/0Fp7rA01nsXzPG644bV88zvZEeDRXXt5e6XI1cCf744oV9td\nwfZ0opTyvZ6UydDR2DOSej67mvzOL13ws0PgQtU/E94nSw63FN49VEu7qZYvIin0bt3wbuR5VEu7\niNwQ892Xkxa6NrqiljhVeC3DyT8zUdjHN18zDGNf4mh9lMrdd2eHDn/rtz7ctufv+OTLrZghaHT0\nasBRLIaUTteYLQM+FGvww6eKF17kNrTY0k486I9XBsrRrpRv7kjojmEw8hiseQxEWWs7cNsggM7T\nfCEL7uNlx6nQLX2ejYI+aqXd9ZZ378YW2W5boKvkvJz8J4bTMRywQDeX9BeZns4OI7Zi6OG5ZgjK\nZZAv2r//ehZmFijX4O5JBfj5qHmOE+XsgOR0MbuNfdhZ9dg/sfKDWuw5UiBUaDflcEwHcLwr5Xg5\nZbq4uB7i4mDW8i7t3jIt0HMJoim65l/ET6tM7WjPcLvNLBh7kj4mCX2fz33uC1x11VUtedwtG+Sg\nESprSXAUmvRNTwWOv9sVZwsOemIYiDyGax6Xzxc6XGU+pTgmwuwN8Xg5ZaH+/ufwiMJhqqVdVEu7\ncH5pYwvtBOcIayfpmv8eYXQagDjoY3LwJpy/PRtZR594pKWPd64gz+Up+rKSwzEbwHTgmKm3sqeD\n7GDaPSdW/iP1xXDNpM9A7NGv7pF1i+rDBE+UU06Us08xAKkXUAt3UivtphbuxPnb619rYOppwtoE\nALXiMPPdlxOFw9uj338T2F6vti3A4ZqO/kiBvx2JaczjUgL9kde0Ve7jcYVa3usyWzgT3KfCMwcr\nE79Mrd7qjoo7tld/8DK1cCepFzLffTlJsX/tH5CWUpBvYgt+1rqeCepfRZgNHG8eCyilZwdzAY8f\nmPUJ0yy8+2NvxX1kfVKywD5RzgJ8ruG/JAr6s/AOR7b2MMHVuAS8lQ2Ahe7LO1+LLFGQb7A0G9WJ\n36SV/Y3hmNmGXhGv3pdd86GUrnysq2fUwn6lKn7WZTJWThkvnekycV6BWnGYWmmEajiCK2yD/u7l\nnKMYnaZr/kUKyTynh16//d7ANjkFeQdNB9nokNnAMRcs3sLNEwV21lb+Y7x63if2oS/y6Is9euLm\ngS/nLyU7oWms5BgrpUyFZ7YlfhfV0gi1cIQoHNq+XSb1MzC7Fl6kGM8AEAUDeGlte76hbWIK8hZK\nyIK5lNK0W+P5voRjXWcG7QT1bpDV/MCcWtit1NjqPlk6c2KTwyMqDlEr7aQW7iQp9KjFCQxMfpMw\nmsAB1dJu5rsvIy7u2OiypAkF+QU4UcrGDM8FMBdkF4UCuHbSbzqE79J5n51VR2/s0Rt7lFKdtt5O\nSb2ve7yctbpnGrqpEr9cH2UyQq04BNtslMl6VLr2EAe9LHRfti3Gv+eZXr3LOBwVH+aD7EzH+SC7\nXkiz64lMho7v92Qt7HICw9Xs+ui9Tc6MBNjd5DGkdRyOmQDGS1k/90TpzLVMfAe1cDgLb7W6z3CO\nQjLX9CzTavliquWLN6AoOV/bLsgdjsSj6bjpF7sTnhlIWd4rcvlc8+uJvHrO55IFn+6YpifdSPtV\nfMd4yTFeyrpLqg0fhOJC71J4R+GOpqMttisvjShXXqJr/t/wXMLEzjfq95NjWzrIZwLHy10p8wXH\nQgEW6reXzntcN7Vy18tpdnJMdwLdsUd34tEdQ98qLeyu1MsGcEvH1LyspX2y5DgZpmeN6iklUCld\nTC0cJgqHSQvljSt0kyrEM3TNf59y5WU8Uhw+lfIleC7BKchzK/dB/r2ehAS4ssmBwdnA8XzfmaQt\nJdlp6F1J82C+qOJzUUXdH5tJ7GX93CfDLLynimcuQJUNDdxBFA5TC4e3zxUEL0DvjCWMJkj8Lha6\nLqXStQfnh2v/oGxquQ/y6SA7Nf3KuZXbhmoe+08WKCfZ9bDV/bH5xV42q9Biq3uyeOZMSs9BFA5S\nK2Yt7qg4sH2HBr5Cc70
"text/plain": [
"<matplotlib.figure.Figure at 0x7f4a65bc4550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot with standard configuration of SVM\n",
"%run plot_svm\n",
"plot_svm(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Any value in the blue survived while anyone in the read did not. Checkout the graph for the linear transformation. It created its decision boundary right on 50%! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Titanic Machine Learning from Disaster](https://www.kaggle.com/c/titanic/forums/t/5105/ipython-notebook-tutorial-for-titanic-machine-learning-from-disaster)\n",
"* [API SVC scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)\n",
"* [Better evaluation of classification models](http://blog.kaggle.com/2015/10/23/scikit-learn-video-9-better-evaluation-of-classification-models/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1+"
}
},
"nbformat": 4,
"nbformat_minor": 0
}