You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml2/3_4_Visualisation_Pandas.ipynb

4796 lines
792 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Introduction: preprocessing](#Introduction:-preprocessing)\n",
"* [Visualisation with Pandas](#Visualisation-with-Pandas)\n",
"* [Loading and Cleaning](#Loading-and-Cleaning)\n",
"* [General exploration](#General-exploration)\n",
"* [Feature Age](#Feature-Age)\n",
"* [Feature Sex](#Feature-Sex)\n",
"* [Feature Pclass](#Feature-Pclass)\n",
"* [Feature Fare](#Feature-Fare)\n",
"* [Feature Embarked](#Feature-Embarked)\n",
"* [Features SibSp](#Features-SibSp)\n",
"* [Feature ParCh](#Feature-ParCh)\n",
"* [Recap: Filling null values](#Recap:-Filling-null-values)\n",
"\t* [Feature Age: null values](#Feature-Age:-null-values)\n",
"\t* [Feature Embarking: null values](#Feature-Embarking:-null-values)\n",
"\t* [Feature Cabin: null values](#Feature-Cabin:-null-values)\n",
"* [Encoding categorical features](#Encoding-categorical-features)\n",
"\t* [Recap: encoding categorical features](#Recap:-encoding-categorical-features)\n",
"\t* [Encoding Categorical Variables as Binary ones](#Encoding-Categorical-Variables-as-Binary-ones)\n",
"* [Cleaning: dropping](#Cleaning:-dropping)\n",
"* [Feature Engineering](#Feature-Engineering)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction: preprocessing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous session, we introduced two libraries for visualisation: *matplotlib* and *seaborn*. We are going to review new functionalities in this notebook, as well as the integration of *pandas* with *matplotlib*.\n",
"\n",
"Visualisation is usually combined with munging. We have done this in separated notebooks for learning purposes. We we are going to examine again the dataset, combinging both techniques, and applying the knowledge we got in the previous notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation with Pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas provides a very good integration with matplotlib. DataFrames have the following methods:\n",
"* **plot()**, for a number of charts, that can be selected with the argument *kind*:\n",
" * 'bar' for bar plots\n",
" * 'hist' for histograms\n",
" * 'box' for boxplots\n",
" * 'kde' for density plots\n",
" * 'area' for area plots\n",
" * 'scatter' for scatter plots\n",
" * 'hexbin' for hexagonal bin plots\n",
" * 'pie' for pie charts\n",
" \n",
"Every plot kind has an equivalent on Dataframe.plot accessor. This means, you can use **df.plot(kind='line')** or **df.plot.line**. Check the [plot documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html#pandas.DataFrame.plot) to learn the rest of parameters.\n",
"\n",
"In addition, the module *pandas.tools.plotting* provides: **scatter_matrix**.\n",
"\n",
"You can consult more details in the [documentation](http://pandas.pydata.org/pandas-docs/stable/visualization.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading and Cleaning"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# General import and load data\n",
"import pandas as pd\n",
"\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"sns.set(color_codes=True)\n",
"\n",
"# if matplotlib is not set inline, you will not see plots\n",
"\n",
"#alternatives auto gtk gtk2 inline osx qt qt5 wx tk\n",
"#%matplotlib auto\n",
"#%matplotlib qt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#We get a URL with raw content (not HTML one)\n",
"url=\"https://raw.githubusercontent.com/cif2cif/sitc/master/ml2/data-titanic/train.csv\"\n",
"df = pd.read_csv(url)\n",
"df_original = df.copy() # Copy to have a version of df without modifications\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>0</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>1</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>1</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>0</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp Parch \\\n",
"0 Braund, Mr. Owen Harris 0 22.0 1 0 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 38.0 1 0 \n",
"2 Heikkinen, Miss. Laina 1 26.0 0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 35.0 1 0 \n",
"4 Allen, Mr. William Henry 0 35.0 0 0 \n",
"\n",
" Fare Embarked \n",
"0 7.2500 0 \n",
"1 71.2833 1 \n",
"2 7.9250 0 \n",
"3 53.1000 0 \n",
"4 8.0500 0 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Cleaning\n",
"df_clean = df.copy() # We copy to see what happens with na values\n",
"df_clean['Age'] = df['Age'].fillna(df['Age'].median())\n",
"df_clean.loc[df[\"Sex\"] == \"male\", \"Sex\"] = 0\n",
"df_clean.loc[df[\"Sex\"] == \"female\", \"Sex\"] = 1\n",
"df_clean.drop(['Cabin', 'Ticket'], axis=1, inplace=True)\n",
"df_clean.loc[df[\"Embarked\"] == \"S\", \"Embarked\"] = 0\n",
"df_clean.loc[df[\"Embarked\"] == \"C\", \"Embarked\"] = 1\n",
"df_clean.loc[df[\"Embarked\"] == \"Q\", \"Embarked\"] = 2\n",
"df_clean.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# General exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous session we saw that *Seaborn* provides several facilities for working with DataFrames. We are going to review some of them."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>714.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>446.000000</td>\n",
" <td>0.383838</td>\n",
" <td>2.308642</td>\n",
" <td>29.699118</td>\n",
" <td>0.523008</td>\n",
" <td>0.381594</td>\n",
" <td>32.204208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>257.353842</td>\n",
" <td>0.486592</td>\n",
" <td>0.836071</td>\n",
" <td>14.526497</td>\n",
" <td>1.102743</td>\n",
" <td>0.806057</td>\n",
" <td>49.693429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.420000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>223.500000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>20.125000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.910400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>446.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>14.454200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>668.500000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>38.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>891.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>80.000000</td>\n",
" <td>8.000000</td>\n",
" <td>6.000000</td>\n",
" <td>512.329200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp \\\n",
"count 891.000000 891.000000 891.000000 714.000000 891.000000 \n",
"mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n",
"std 257.353842 0.486592 0.836071 14.526497 1.102743 \n",
"min 1.000000 0.000000 1.000000 0.420000 0.000000 \n",
"25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n",
"50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n",
"75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n",
"max 891.000000 1.000000 3.000000 80.000000 8.000000 \n",
"\n",
" Parch Fare \n",
"count 891.000000 891.000000 \n",
"mean 0.381594 32.204208 \n",
"std 0.806057 49.693429 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 7.910400 \n",
"50% 0.000000 14.454200 \n",
"75% 0.000000 31.000000 \n",
"max 6.000000 512.329200 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# General description of the dataset\n",
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId int64\n",
"Survived int64\n",
"Pclass int64\n",
"Name object\n",
"Sex object\n",
"Age float64\n",
"SibSp int64\n",
"Parch int64\n",
"Ticket object\n",
"Fare float64\n",
"Cabin object\n",
"Embarked object\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Column types\n",
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Name object\n",
"Sex object\n",
"Ticket object\n",
"Cabin object\n",
"Embarked object\n",
"dtype: object"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Columns non numeric\n",
"df.dtypes[df.dtypes == object]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Survived 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 177\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 0\n",
"Cabin 687\n",
"Embarked 2\n",
"dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Number of null values\n",
"df.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f994ad470>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f994275f8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f993ed438>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f993b8668>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f99370fd0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f9933d588>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f992f7a90>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f99247438>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f99215160>]], dtype=object)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEMCAYAAAA/Jfb8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXmYHUXVuN9JyDpJSAghQAJhkwMIfArxEwiQEEBEdkRA\nEEEIoiIqoP5AP9kRjSQgm+yrIAIi+y4QkE0UFVA4kIQEkrAkYUIWss7M749TN+m5uUtP373nvM8z\nz9zbt7q6uqv7dNWpszS1t7fjOI7jpItutW6A4ziOU35cuDuO46QQF+6O4zgpxIW74zhOCnHh7jiO\nk0JcuDuO46SQNWrdgHpFRJ4D+qrq52vdFqc8iEgbMBlYETY1AdNUde/atcoplax+7Q7MA05X1SfL\nVP9o4FpV/Uw56qsWLtxzICKfxW6QOSLyRVV9qdZtcspCOzBaVd+vdUOcstKhX0VkJ+B+EdlcVeeW\n8RgNhQv33BwN3AEsCZ9fAhCRnwE/BKYBNwI/VdWNRaQncCGwF9ADuEZVL6h+s50iNIW/1RARAa4F\nBmPPxRmqenv4rQ34GXYvbAVsCVwBrIfdI8eq6j8q3nonHx36VVWfF5HJwI7AAyIyDjgFG9W/Dxyl\nqu+JyNHA/sCawN9V9TQR+X/At4HlwIOqemrmGOH5/wb2jI9T1UlVOr9EuM49CxHpBhwE3AXcB+wt\nImuE0fxPgG2AXYBDWfU2/3/AFsBnw99XReQr1W67UxK/Ae5T1a2A44DrRKR7tICqbhk+/hm4UVUF\n+A5wb7hvnPqhB7BURIYAlwC7h/6aAvwiUm5P4NtBsO8MHIs949sAo0Tkq6HccODf4f64Evi/Kp1H\nYvyGXJ29gJdVdZGqLgYmYW/3XYCnVPUjVV0GXB/ZZ1/gClVdEfa5BTi42g13YvG0iPw38ncVgKru\nD0wIZZ4DemMj8wwPhP9bAENU9caw3wvAbGCnajTeKY6I7A0MBZ5T1dnAgIgq7llgk0jxt1R1avi8\nNzZa/1RVlwNjgLvDb5+o6oPh8z8xYV/XuFpmdY7BRusfY1O97sAg4EXg40i5mZHPA4GLROSXYZ+e\nBFWOU3fk1LkHgfBzEVmbVTOy6OAn0/cDgWYR+W/43gT0x9Q5Tu14WkRWYH02DdhbVT8NM6rzRGS/\n8NsAQCP7RZ/ptYk816q6BMA0dsyPlGvF5EJd48I9gogMBEYDg1S1NWzrDswA/gb0ixRfP/J5FvAb\nVX2oWm11ErOazl1E1sDWWA5R1UfDGspici+izcJGcVtVtplOJ8m3UH4YNrPeWVVbgv79iDx1zMEE\nPAAislb5m1k9XC3Tka8DT2YEO0D4/Ah2rXYTkcEi0gv4ZmS/e4HjRaSbiDSJyM9F5EtVbblTCs1A\nXyCzKPojYCk2Iu+Aqk4HZmR0sSKytojcJiJ9qtVYJyc5F8qBdTBz1xYRGYytlfXLU/Y+YH8RWTO8\n8O8BGvY5duHekaOwDs3mHmA/4CZM3/YEdiNkRnaXA9OB/wD/xfSyf610Y51Ok9OcTVU/AcYD/xKR\nfwBvY33+gIj0zbHf4cD3ReQN4Gng8bDW4tSGQmaKfwDWFpG3gFuBnwMbiMhvsgsGk+ffAP8GXscs\naG6vQHurQlOceO4isjV2s09U1StEZFfgfMxcaCFmWvSJiPwEOARoA85R1Ycr1/TaEqxhzlXV7Wvd\nllIRkWbgZmxtoSdwDvAB8DusL19V1RND2S7Tx2lCRI7FBi/t2Ch3e2BnvI9TS1HhHkYuDwBvYTfA\nFSLyMvB1VZ0sIqdjCwx3AHcCO2BC4llgK1VtOOP/XISFtjeB7YD3gBuAhar6/Zo2rAyIyInA+qr6\ncxFZF3gK0y3/RFVfEZFbMeGvpLiPuwphcPY1zGz3x97H6SSOWmYJZiIUXayYDQwJnwdhCxG7AQ+r\naquqzsFWrFOz6BTO6WfAXzAhPwg4q5ZtKiNzWGXtMRiYC2ysqq+Ebfdj9sCp7uMuxBnAr4GNvI/T\nS1FrGVVtw5wBoptPASYFc8EW4DTMkWd2pMxszE74P2VrbY1R1auBq2vdjnKjqn8UkWNE5G3M1G9/\n4LJIkY+wvpxDyvs47YjISOBdbLbdEvnJ+zhlJF1QvRQ4IHjs/RU4MUeZfKvXTp0hIkcC00NgpLHA\n77OK5OtL7+PGYxwWOgM69p/3ccpIaue+raq+GD4/ARyJqSu2iJQZhult89Le3t7e1FT6vdPa2sq0\nadOKlttoo43o3r3ufQ8qTa4LPgp4FEBVXwtmfdF7Yxjm3DGLGvWx0ykKXfAxQGadKOp45X3cWBS9\n4EmF+/sisoWqvgl8AVtsfQo4RUTOwGxL11fV/xaqpKmpidmzF8Q+6JAh/XOWnz79HU6d+CA9m/M7\nCS5bNJcJp+zDiBEbd6ruzralEerOwWRsAe3PIjICWAC8IyKjVPU5LJTCJZiJYEX7OLutSfctdf9G\nP3YuRGQ9YIGqrgjf3xCRnVT1eUro47Mn3MTSpSsKFWG3HT7L57fdOmdbG/k611sfRykq3EVkOyxQ\nzueBj0XkEEwN80xw5lkCbK+qM0Tkn5gerx0zsaoaPZsH03vA0Ly/t7e3MXPmjLy/z5/fTEvLIgCG\nD9+wq43wrwKuF5GnMbfqEzBTyKtFpAl4KRMbW0SuwSwo2rCgWU7jsB6mW89wMnBVqX388oz+dOte\nWJSMeOfdnMLdqRxxRu5vYrbs17HKFPK7wK2qenJw5/1cWFzdEbuBVgAvi8gFqjqvUo3vDMsWtTDx\njy30bM4v4K1c4RF+SjkMG6W1YS/m+zAb6O5hW9T7si+r1mrcK7OBCJYx+0S+vwHsmqPc5ZhjntPA\nxBHuGVPI0yLb9sPMqVDVawFEZDfgb6q6MHz/K6bLfZA6odjovquiqtcTolxGbKAvBk7K2ECLyF6Y\nDfShRGygReSRQjbQd9/7MAsWLMn5W1NTE1/afUxXmyXVjLBw/hPM+fAM4DUsgmk3VsU5Xx7K/RCz\nqLkm3B9Og5HUFHIj4CvBhfd9TE2zLrlNqJzG4gwsMuYzOWyg1yfYQGNZqqZhNtB5zeR+9+AMeuVZ\nC2mdP41RO4xkwIA1y9Z4JzchCNYZmHq1P+aF/DXgUlW9W0TOB44VkVuweOcjWTUDv7teZuBOfJIu\nqDYBb6jqOSLyc+B0LOZKdpmixFkYKFZ+/vzmTtVRjEGDmmO1qzNtL8d5lqvufFTCBrqpqTtN3XKP\nzJu6eWijKrIHFgPnU+BT4AQRmYqtr4C9wH+MGUfU9QzciUdS4f4B8Ez4/CjmqfkApq7JMAx4oVhF\n5bAiySyElouWlkVF25Uya5kMVbWBbmpqYvDgfgwcmL9Npb64Stm/kY+dg42wOPT3Yo5qZ2MJ4JeH\n3zMv8KH4DDwVJBXuD2N6+BuxAESKxTu/VkQGYItwO2F6O6dxGEOZbaAL0d7ezty5C1m+PPfIvtFN\n1erMTK4JWAtLIbkRZrpcNSemfv17531hNfJLtM5e4B1Iagp5BHCJiJwKbA2so6pLRORB7GFvx6xp\nkt+hTlWplA20Uzd8CDwf1tCmisgCYLmI9FLVpXR8gUdH6rFm4MVYuGBJzhdWo79E6+wF3oE4Ss+o\nKeTZqjpWVT/AwofOAWaq6uwQPXJvbGF1CJZcdmDSxjtVJ5cN9K9E5Flgsqo+qarvARkb6DtxO/dG\n4jFgbEgmMxhLWPEEFtoX4KtYUpq/ASNFZICI9MNm4M/WosFOaSSNCgkWIfEyYFn4/kXCQkzIPZhZ\niHEagy2BYSLycsgnugBbWG0DhotIj1BuHmb/3o2OiYadOkZVZwF3YbmAH8Qs3M4EjhaRSZhp603h\n2T0Nexk8BpzlM/DGJJEppIhsjsWXOTOS0cRNIRsUN5PrGqjqNdjMK8pqaeRU9W7g7qo0yqkYSRdU\nJwInhc8lLcS4KWT1686Bm8mlGBEZjanRXseey1exdHLuwJRiOi3cRWR9QIBbQ0yK9UTkKWyK56aQ\nJZatRt052Ag3k0s7T6v
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f99496d30>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Analise distributon\n",
"df.hist()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>PassengerId</th>\n",
" <td>1.000000</td>\n",
" <td>-0.005007</td>\n",
" <td>-0.035144</td>\n",
" <td>0.036847</td>\n",
" <td>-0.057527</td>\n",
" <td>-0.001652</td>\n",
" <td>0.012658</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Survived</th>\n",
" <td>-0.005007</td>\n",
" <td>1.000000</td>\n",
" <td>-0.338481</td>\n",
" <td>-0.077221</td>\n",
" <td>-0.035322</td>\n",
" <td>0.081629</td>\n",
" <td>0.257307</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pclass</th>\n",
" <td>-0.035144</td>\n",
" <td>-0.338481</td>\n",
" <td>1.000000</td>\n",
" <td>-0.369226</td>\n",
" <td>0.083081</td>\n",
" <td>0.018443</td>\n",
" <td>-0.549500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Age</th>\n",
" <td>0.036847</td>\n",
" <td>-0.077221</td>\n",
" <td>-0.369226</td>\n",
" <td>1.000000</td>\n",
" <td>-0.308247</td>\n",
" <td>-0.189119</td>\n",
" <td>0.096067</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SibSp</th>\n",
" <td>-0.057527</td>\n",
" <td>-0.035322</td>\n",
" <td>0.083081</td>\n",
" <td>-0.308247</td>\n",
" <td>1.000000</td>\n",
" <td>0.414838</td>\n",
" <td>0.159651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Parch</th>\n",
" <td>-0.001652</td>\n",
" <td>0.081629</td>\n",
" <td>0.018443</td>\n",
" <td>-0.189119</td>\n",
" <td>0.414838</td>\n",
" <td>1.000000</td>\n",
" <td>0.216225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fare</th>\n",
" <td>0.012658</td>\n",
" <td>0.257307</td>\n",
" <td>-0.549500</td>\n",
" <td>0.096067</td>\n",
" <td>0.159651</td>\n",
" <td>0.216225</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp Parch \\\n",
"PassengerId 1.000000 -0.005007 -0.035144 0.036847 -0.057527 -0.001652 \n",
"Survived -0.005007 1.000000 -0.338481 -0.077221 -0.035322 0.081629 \n",
"Pclass -0.035144 -0.338481 1.000000 -0.369226 0.083081 0.018443 \n",
"Age 0.036847 -0.077221 -0.369226 1.000000 -0.308247 -0.189119 \n",
"SibSp -0.057527 -0.035322 0.083081 -0.308247 1.000000 0.414838 \n",
"Parch -0.001652 0.081629 0.018443 -0.189119 0.414838 1.000000 \n",
"Fare 0.012658 0.257307 -0.549500 0.096067 0.159651 0.216225 \n",
"\n",
" Fare \n",
"PassengerId 0.012658 \n",
"Survived 0.257307 \n",
"Pclass -0.549500 \n",
"Age 0.096067 \n",
"SibSp 0.159651 \n",
"Parch 0.216225 \n",
"Fare 1.000000 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can see the pairwise correlation between variables. A value near 0 means low correlation\n",
"# while a value near -1 or 1 indicates strong correlation.\n",
"df.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do not find any relevant correlation. We could also represent this with a scatterplot."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.PairGrid at 0x7f2f9919fe10>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABoUAAAZNCAYAAAADBFicAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8W+d54PvfOdiIhbtIybtk2TqSLNuKXSeN3ay3qZ3J\nMlbS1k5sd7K5Gd+k0yRtJrl3ppk2t22ScTNN7bpqG2eV7di3TezeJI3ttuMm8ZJEjiTLtCRQkkXt\n3Ali5cEBzrl/gABxDkGQBAFi4fP9fPyxXhA4eEEenOV93vd5FMuyEEIIIYQQQgghhBBCCCGEEK1N\nrXcHhBBCCCGEEEIIIYQQQgghRO1JUEgIIYQQQgghhBBCCCGEEGINkKCQEEIIIYQQQgghhBBCCCHE\nGiBBISGEEEIIIYQQQgghhBBCiDVAgkJCCCGEEEIIIYQQQgghhBBrgASFhBBCCCGEEEIIIYQQQggh\n1gB3rd9A07QdwBPA/wqHw3+jadrFwB5yAanzwF3hcNjQNO0O4PeBLPDVcDj8dU3T3MA3gcuADPDB\ncDg8VOs+CyGEEEIIIYQQQgghhBBCtJqarhTSNC0A3Af8a9HDnwfuD4fDbwKOAx+afd4fAW8F3gJ8\nUtO0LuD9wFQ4HH4D8OfAF2vZXyGEEEIIIYQQQgghhBBCiFZV6/RxM8Dbya0Iynsz8P3Zf38feBvw\nOuAX4XA4Hg6HZ4BngV8D/g/g8dnn/itwU437K4QQQgghhBBCCCGEEEII0ZJqGhQKh8NmOBzWHQ8H\nw+GwMfvvUeACYD0wVvScMefj4XDYAszZlHJCCCGEEEIIIYQQQgghhBBiGWq9UmgxyjIfr3d/hRBC\nCCGEEEIIIYQQQgghmlI9Vt3ENE3zza4gugg4C5wjtzIo7yLghdnHNwAv51cIhcPhTLmNW5ZlKcpC\nMSUhVqSmO5bsu6KGar5jyf4rakiOvaJZybFXNDM59opmJcde0czk2CualRx7RTNbkztWPYJC/wq8\nF3hk9v9PAr8AHtQ0rQMwgRuB3wc6gd8C/gV4N/DMYhtXFIWxsVhtej6rr69d3qPB3me13qOWarXv\n1up3U4vtSl9r19dak2Pv2nuP1XofOfbWdpvNtt1m62utybF37b3Har1Psx57i7Xa30PeY+nvUWvV\n3H+r9Tup5u9WtlXfbdWSXPdKX2u13WY79i6klc6F8h7Le5+1qKZBIU3TrgO+DFwGGJqm/SZwB/At\nTdM+CpwEvhUOh7Oapn0WeJpcUOiPw+FwTNO0x4C3aZr2U2AG+EAt+yuEEEIIIYQQQgghhBBCCNGq\nahoUCofD+4C3lPjRb5R47veA7zkeM4EP1aZ3QgghhBBCCCGEEEIIIYQQa4da7w4IIYQQQgghhBBC\nCCGEEEKI2pOgkBBCCCGEEEIIIYQQQgghxBogQSEhhBBCCCGEEEIIIYQQQog1QIJCQgghhBBCCCGE\nEEIIIYQQa4AEhYQQQgghhBBCCCGEEEIIIdYACQoJIYQQQgghhBBCCCGEEEKsARIUEkIIIYQQQggh\nhBBCCCGEWAMkKCSEEEIIIYQQQgghhBBCCLEGuOvdASFEbRw/E+F/fmc/mayF26XwX+98DZsv6Kp3\nt8QCRhJj3Hfg70lmUgTcfv7Lzo+yPriu3t1qCK3yu4kn0+x5epBIIk1X0MtdN28h5PfWu1sVGZ5I\ncO+jB0jOGAR8Hj59x042dAfr3a2G8OrIGH/5/MNk3QlcmSCfuulONvU33/66VoxMT/GVFx4hRQw/\n7XzixvezvqO73t1aE+LpBI8NPs54apILu/rZddm7CHmD857z8KHvEh45hx73wZmr2XpRPx98x1aw\nYM/Tg4xFUvR1+Zv6mCqqa+DsSXa//DUsVxol6+Vj19zN9gsvKfua/Dla9iexWirZT2utla5VxeqJ\nJ9N8/dt7OXV+mlgyg9+rErOiZDc+D6407b4gn7juPzfl/ZtYO1rlHi5/fR3JTNPp7uT2LbvmXV83\nAzkfrQ4JCgnRor748D6yVu7fRtbii9/ex1c/89b6dkos6C/3/S0xIwZAOpvmL/ft5otv+KM696ox\n/NX+v2U6Pfe7+av9u/nzX2u+381XnzzAkeyzKO1JhnQ/Mz9K8cn33FDvblXkz/a8SGImC4Bu6PzZ\nt1/k/t9/U5171Ri+/NxD0HUeBTCZ5i+e3cMD7/lkvbslFnDv898k5TsPgMEk9z73Tf7i7fL3Wg2P\nDT7OvtGDAJyKnSGZSGMOXcdYJEWo3WQ0+CJx1zkslwEeoBsypsX+YzvJfP8QbV43e4+MAjA0HOPY\n2Wn+5EM32G4Yiwf6L17fzm+/+fIV31AWB7N6/T1Ne7Pdyna//FXwplEAXDM8cPDveODCPy37mj1P\nD9r2J4B7bt1R456KtWz3y18D70zRfvpVHrjw88veTjUnTxV/D/Ja7XsgA43Vt/ufBjh8MlJoT7nS\n+HY+g+rKDUZEjSh/te9v+fM3/PcFt1Hq3NpHe837LkTel5//NnSOzN3DPfctHtj1B/Xu1rI9fOQf\nODh+qNDOmhk+es0H6tehCn3jR0fYf3S80M5kTX7vvdfUsUetSYJCQrSofEBoobZoLPmA0ELttSya\njpdtN4tj/BR370iuEYpyLPIToDmDQgkjhWfzIRRfEkv3kxi6qt5dahhZdwKXoy0aV9KTC+AVt0X1\nlRrsGU1M2J7z8pkzJI9cDIAneAC3d3jedlxdo7B5P4dO7eCS3h7bz6ZiOt/45yO2G0bnQL+uZ0oO\ncOZXjCWtKAGlo+yKMWcwSwE+vOPOpf8yRM1ZnrTte2150ou+5mx0FN+1z6C4DayMh7Njb6ldB4WA\n3AohR7sS1Zw8NRZJlW23grUQ+Fpth8+M2u4LULOFgFBeVE+UnVRR6tz62YvuWe2PItYwMzBhq69i\nBiYWfG4jG5w6XrbdLMKnImXbojokKCSEEKKhWaZlq4BnmU0a4QyNl283Ec+mAdw9szfUoSgoFnBL\nXfvUKKx0GxB1tEXDUhZpC5tKV8mUGuyJTrmhaHK2kZr7rii+ZMntKC4Td+8IGRSmR1437+fOG0bn\ngObwdISvDTw0r/9feeERot6TAESZ4ivPP8IXbvlYyT6MpybLtkUDsLB/l5dw2RC94N9RXToAiksn\nesG/A7LCvlpkdcZ8lmLad1PFrGg71Zw81R3yMcTcxLTudl/F22pUayHwtdo8Gw/h7p2dyBGKYpU6\n5ppeHj70XQ5ODgC5awE9nUE99SuMRVJELzxrG52Uc6tYda5M+XaTmMnqZdvNw3kgadIxoAYnQSEh\nhBCNzQT7tJ16dWSF1Gz5dhNR2yfKttc0JVu+LUQTq3SVjHNwZzQxTsqwMJXcrYgZ68EY2p77oSuN\n4il/A6v4kkzFSz3HfsPY1+UvpAIDmOx4kfHRc4X+Z80MbtVN1HXG9rqY6xwjiXF+cOLJeQGkXn8P\np2Jzz1/nt69YEvVXQUyIrGumbLsUCXQsnazOmE9RyreXynLs4c72irZVcnR/aRq1roXzvNDX5a9j\nb1qDcyKHc182TdiSvJmjoz+1jUAePneO5OxxwRN04+6d+1kl51ZJ7ypWQuaJNZYtl3Rx4NiErS2q\nT4JCQgghGpu6SLtJVDJI1agUV7Zsey1TQ9Nl26KxyA3g8jhTvuXbxbV7+rr88wbHnYGUkeQoRjBT\nOJyblgrZ3PM9Gw+h+uYCPqapoKqOgcoFVuC53Sqf/+beQh/uunkLkJsJHk8ZRF1xW3rHl0eO5moW\nuezbsVwG9x34OyJ67vtbHAC7fcsuFHKBrnX+Hm7bsqvcr2xZFvs9iqWpaLC9gpP0agQ6WiXwJKsz\nmsNk1B4MnYwtHhxdSPEkAqBhUm3mzwvF3ymxMpbuz2UOKOOY+99RLPtkjuzM3LncGLqKQJsLV+cU\nWJDOZojpy1v1JuldxYq0yk1Bi0yo/dA7trHnqUE5VteYBIWEEEI0tha5QLMMD4rPsLVFK5Kl7s2k\nlYK1q8GZ8i06lbuVcNb
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9919fd68>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# General description of relationship betweek variables uwing Seaborn PairGrid\n",
"# We use df_clean, since the null values of df would gives us an error, you can check it.\n",
"g = sns.PairGrid(df_clean, hue=\"Survived\")\n",
"g.map_diag(plt.hist)\n",
"g.map_offdiag(plt.scatter)\n",
"g.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two many variables, we are going to represent only a subset."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.PairGrid at 0x7f2f97780cc0>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkgAAAIVCAYAAAAqDgCGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3X2cXGV9///Xmb3L3iWb4JIgCAQwH8SgiFptVBBBwCpi\n0AoVKN7Q0khrUVta/f2sN1/xBr9QgVZUlOoXFcG2gfIFNFKsBVFRCSXIj08IJEAIkA3JZje72d3Z\nmfP7Y2Z3Z8/Ozt7NnDOz+34+HsnOda5rrvOZc87MfvY615wThGGIiIiIiIxJJR2AiIiISLVRgiQi\nIiISoQRJREREJEIJkoiIiEiEEiQRERGRCCVIIiIiIhH1ca/QzJqB7wDLgSbg8+5+e0H9VuApIAuE\nwLnu/mzccYqIiMjCFXuCBJwB/Mbd/7eZHQr8FLi9oD4ETnf3/QnEJiIiIhJ/guTuNxcUDwWejjQJ\n8v9EREREEpHECBIAZvYL4GDgHUWqv25mK4F73P2T8UYmIiIiC11ik7Td/Q3AmcD3I1WfAj4GnAgc\na2ZnxR2biIiILGxB3PdiM7PjgZ3uvj1f/j1worvvKtJ2HXCgu3+2VJ9hGIZBoLNyMmsVP3h0jEoZ\n6DiVajevDp4kTrGdABwGfNTMlgOtI8mRmS0GbgbOcPc0uVGkH03VYRAEdHX1Vizgzs529T/P+6+0\nShyjldgu5e5TMZa3z0rTZ6n6n2v/80kSCdLXgW+b2X8Di4CLzewCoNvdbzWz24FfmVk/sNHd/y2B\nGEVERGQBS+JbbAPAuSXqrwGuiS8iERERkfF0JW0RERGRCCVIIiIiIhFKkEREREQilCCJiIiIRChB\nEhEREYlQgiQiIiISoQRJREREJEIJkoiIiEiEEiQRERGRCCVIIiIiIhFKkEREREQilCCJiIiIRChB\nEhEREYlQgiQiIiISoQRJREREJEIJkoiIiEhEfdwrNLNm4DvAcqAJ+Ly7315QfwpwGTAM3Onun487\nRhEREVnYkhhBOgP4jbu/GTgbuDJSfxWwFngjcKqZHR1veCIiIrLQxT6C5O43FxQPBZ4eKZjZSuAF\nd9+RL98BnAw8GmuQIlXsg+svpakdggDCEAZ74fq1lycdlqB9M5l9Q31c9btvsaP/GcJIXRhCKr+9\nMr2LSS3aT1CfIRxuYHDzcTQc/ASp9t0EqSxhNoS63PYdee7Ith5ZFu17usvDELK9LZBtJtW2N7cw\nmyJM1xM0ZAjTjYSDLaS3vRwyjVA3RMPKh0m178417e0gvfUVkGkkCCAVQibfd31dwN+d9yqWL2nh\nhg2b6ereT2dHM+eftgpCuGHDZp57oY/9QxmamocZXrGJxUuHWdq0lPS2l7OnOzvavq25kX39Q5P2\nU7isrblx1vtMEkiQRpjZL4CDgXcULF4BdBWUdwJHxBmXSLVraodUfuw3CHJlqQ7aN8XdtHk9O/Y/\nAwFE85WRRCUIoH5Jz9jyukGajrmfVF1YsGzy5xYzk+VBAKkl/UD/+IqmofzPQWjrBQLSjx9Hw+GP\nUL9s52iz1LJdED5C+vHjcsleQRfDmZDLv7eR417ayW8ezT1n23O9o/UjywAaXvQg9Y3P0dMH2/ue\nYTi9i/Rzx422X/eu1dywYXPJfgrbyuwlliC5+xvM7JXA94FXTtJsksN7os7Oyn4Sqf/53X8cyvUa\noh/uQVDe7VPubV2JfVetMVZ638ShEvF2D++d1fOCVHS8KXlBU/+4n8XqihnOhHT3DY1bFi0X66Ow\n3N03RGdn+7T6GWkbVWvHY5KSmKR9PLDT3be7+/+YWb2ZvcjddwE7gIMKmh+cXzalrq7eqRvNUmdn\nu/qvUP+ZTIbt25+atP6QQw5lxYqOiscfh3K9hujpgTAsX9/lPlYqcexVc4yV3jdxqMR7bUn9klk9\nL8wGBHXVlSSFgy35n83Q1lO0rpj6uoCO1vGnvKLlYv0W9tnR2khXV++0+hlpWyiO3wXzSRIjSCcA\nhwEfNbPlQGs+OcLdnzSzdjM7lFxi9A7gfQnEKDHZvv0pPnH7Z1m0bOIHy8Dufr749k+zYkVHApFV\nr8FeJsxzkeqgfVPcOavW0tW7m+3922t8DlIr6W3HAOTmIgVhwRykpaN1xeYgXXreq1jekfucGzd3\nKG90DlLfaxluz81BWta0lKGel7NnRXZc+5GfxfoptkxmJ4kE6evAt83sv4FFwMVmdgHQ7e63AuuA\nHwIhcKO7b0kgRonRomUtNB/YlnQYNWNk0m+l/xqUmdO+Ka6tsZVP/OFHZr5d3jaz9cQ/Wn76jPso\nNi9oZNlY/yeMVR43sY+25saS/Uh5JPEttgHg3BL19wJr4otIREREZDxdSVtEREQkQgmSiIiISIQS\nJBEREZEIJUgiIiIiEUqQRERERCKUIImIiIhEKEESERERiVCCJCIiIhKhBElEREQkQgmSiIiISIQS\nJBEREZEIJUgiIiIiEUqQRERERCKUIImIiIhEKEESERERiahPasVmdjnwRqAO+JK7ry+o2wo8BWSB\nEDjX3Z9NJFARERFZcBJJkMzszcAx7r7GzJYBG4H1BU1C4HR3359EfCIiIrKwJXWK7efAH+cfdwMt\nZhYU1Af5fyIiIiKxS2QEyd1DYGR06ELgjvyyQl83s5XAPe7+yVgDFBERkQUt0UnaZnYm8AHgLyNV\nnwI+BpwIHGtmZ8Udm4iIiCxcQRhGB27iYWanAZ8FTnP3vSXarQMOdPfPlugumRchc/b4449zyR2f\nofnAtgl1+3fu46t/9BmOPPLISocRx+lcHaMyVzpOpdrNq6kxSU3SXgxcDpwcTY7ydTcDZ7h7mtwo\n0o+m6rOrq7cSoQLQ2dmu/ivU/549fdOqr3T8cSj3a6jEfi13n4qxvH3GoVo/K9R/bfQ/nyT1Nf+z\ngQOAm/OTs0PgbmCTu99qZrcDvzKzfmCju/9bQnGKiIjIApTUJO3rgOtK1F8DXBNfRCIiIiJjdCVt\nERERkQglSCIiIiIRid1qRAQgk8kysLu/aN3A7n4ymWzMEYmIiChBksSF7Nt4JIPNSyfUpPfvgdP1\nrWMREYmfEiRJVF1dHW2dR7Fo8fIJdQM9z1NXV5dAVCIistBpDpKIiIhIRFkTJDNTwiUiIiI1b06n\n2Mzs/UAL8E3gv4CXmNmX3P3auYcmIiIikoy5jvhcBHwLeBfwMLCS3FWyRURERGrWXBOk/e4+BPwR\ncLO7Z9HNDkVERKTGzXnOkJn9M/AG4Odm9ofAojlHJSIiIpKguSZI5wKPAe909wxwOPAXcw1KRERE\nJElzTZAGgJ+6u5vZacBRwPNzD0tEREQkOXNNkL4HvNjMXgpcCbwAfHvOUYmIiIgkaK4JUou7/xT4\nY+Aad/8a0Dj3sERERESSM9cEqdXMOoH3ALebWQBMvKmWiIiISA2Z673Yvk9ukva33P1pM/s08LPp\nPNHMLgfeCNQBX3L39QV1pwCXAcPAne7++TnGKSIiIjJtc0qQ3P0q4KqCRVcBJ0/1PDN7M3CMu68x\ns2XARmB9QZOrgLcCz5K7fMC/uvujc4lVREREZLrmequRQ4G/BF6UX9QEvAX4tyme+nPg1/nH3UCL\nmQXuHprZSuAFd9+RX8cd5JIuJUgiwAd/8A80HThAEEAYwuDORVz/vs8lHZYAF999Kdkso/smlYJ/\nfsvlSYeVuH1DfXzjf77DE71PTqirI0WKOjJkyJIFYHFDO5ccv47lrS9i31Af33/0RzzWvRVCeEnL\nS9jy4HKGD7mPoCED+W09ss1h4mMYaxN9XGh0eUiu32zA4CN/QAA0HnM/QSocV5fZcix1hztBfZpw\nuI6wfwl1jWkW1Tcx0LATAsjm+2hJLSF90IOk2ncDkOpfRmbbK2ipb2HFskU8tbOPsD5N+1HOkmXD\ndLYewDmr1sJwAzds2ExX9346O5o5/7RVEDJhWVuzpv+W21xPsd0A3AmcAfwTcCZw/lRPcvcQ2J8v\nXgjckV8GsALoKmi+Ezh
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9602aa58>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# PairGrid of variables\n",
"g = sns.PairGrid(df_clean, hue=\"Survived\", vars=['Pclass', 'Sex', 'Age'])\n",
"g.map_diag(plt.hist)\n",
"g.map_offdiag(plt.scatter)\n",
"g.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can observe, for example, that more women survived as well as more people in 3rd class. \n",
"\n",
"We can represent these findings."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9764eac8>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHS5JREFUeJzt3XucVXW9//HX3JBhBmQGJhUQEtCPlzTDnxwg73i8lP7g\nqCmdMk2tjqmFeSnRFC+Ql5/D5agp2qRF2fllZXoUBEvzfh5BZvwsP2VHYQboyGUDM8MMc9u/P/YG\n94zMzBrYa19mvZ+Phw/3Wmuv7/7A0v3ea33X97sK4vE4IiISTYXZLkBERLJHISAiEmEKARGRCFMI\niIhEmEJARCTCFAIiIhFWHPYHmNkngCeBand/oMu2U4E5QBuwxN3vCLseERH5UKhnAmY2CFgIPN/N\nWxYA/wIcB5xmZoeGWY+IiHQW9uWgZuBMYH3XDWZ2ELDJ3de5exx4Fpgacj0iIpIi1BBw9w5339HN\n5v2BDSnLHwAHhFmPiIh0lksdwwXZLkBEJGpC7xjuwTo6//IfmVzXrba29nhxcVGoRfUnTU1NnH32\n2cTjcQoKCnj66acpLS3Ndlkiknnd/sjOZAh0KsLdV5vZYDMbTeLL/yzgX3tqIBbbHmJ5/U99/TZ2\nThAYj8dZv34zgwcPyXJVIpJpVVWDu90WagiY2QTgXmAM0Gpm5wJPAe+5+6+By4GfAXHgcXd/N8x6\nRESks1BDwN3/AJzcw/ZXgClh1iAiIt3LpY5hEemipmYRM2ZMp6ZmUbZLkX5KISCSo5qbm1i+fAkA\ny5cvpbm5KcsVSX+UzbuDJEV7ezt1dWvS2mZjY2On5draNZSVlaX1M0aNGk1Rke7YCkNra2tKx34H\nra2tDByou7skvRQCOaKubg2zfzmH0or0fUl3tLR3Wl74yoMUDkjfF3ZTrJHZ59zImDEHpa1NEcks\nhUAOKa0oo2x497dy9VX7jja2pCwPGlZO0T465CLyIfUJiIhEmH4WiogE1NHRwbx59xCLbaKoqJj6\n+nquvHImY8eOy3Zpe0whICIS0N///jc++OAf3HXXPADq6mqpq6tl2bIlbNjwAW1trUyffh7jxh3M\nDTdcQ3X1fbz99iqWLn2GWbNuyXL1u6cQEBEJ6KCDxjFgwD5873u3cfTREzjqqKMZOrSCdevWcttt\n32PHjmauuurfWLToUS6++FIefPA+Vq9+jzvuuDvbpXdLISAiElBxcTG3334n27Zt5e23/x81NQ/h\n/g4DBgxg7txbAXbdMn3ssZN46KEHOOmkUxg0aFA2y+6RQkBEJKA331zJtm1bOfHEU5g8+dOMH38w\nn/vc/+b00z/DDTfcDMDq1e8DsGzZEo4//kRWrvw9p5/+GaqqPpbFyrunEBARCejgg43q6rtYsuQ/\nGTBgH5qbm5g3737eeOM15syZTUNDA//0T5MZNGgQzzzzNPPm3cdxx53I3XfP4Z57FmS7/N1SCPRj\nBYUps3cXdFmWtMrHEd8a7d135eXl3Hzz7R9Z/6lPHfORdQsWPADAuHHjczYAQCHQrxWWFFF+SCUN\nf91M+cGVFJbof/iw5NuIb432lp0UAv1cxcQRVEwcke0yIkEjviUfacSwiEiEKQRERCJMISAiEmEK\nARGRCFMvk4j0O2HcshvWLbVz597KySdPZfLk49LedhAKARHpd+rq1nDT/CcYWF6ZlvaaGzZzx8zz\n+uUttQoBEemXBpZXMmhIVUY/c8mS/+TNN1eydesW3n//Pb7ylct5/vnneP/997n55tv4zW+W8847\nf6alZQfTpp3LWWdN27VvR0cHd989h/Xr19HW1sall36NCRP+V+g1KwRERNJo7do67r//YZ5++kkW\nL36MH/7wJzzzzFM8++zTHHTQWK666mp27NjBBRdM7xQCy5cvZfjwKr7zne+ydesWvvGNy3nsscdD\nr1chICKSRoceehgAw4YNZ9y48RQUFFBZOYyWlha2bt3K5ZdfQnFxCVu3bum036pVf2LVqj/ypz/9\nkXg8TmtrC21tbRQXh/s1rRAQEUmj1M7j1Nf/+Md61q1by/33P0JhYSGnnXZip/1KSkr40pcuYerU\n0zJWK+gWURGRjHjnnb+w3377UVhYyCuv/I6Ojnba2tp2bT/88CN46aUXAYjFNvPQQ/dnpC6dCYhI\nv9TcsDmn2jr22InU1tZy1VVf4/jjT2LKlOO59947d20/5ZR/5g9/WMHll19CR0ecSy756l5/ZhAK\nARHpd0aNGs0dM89Le5u9OfPMs3a9njLlOKZMOe4jr3c6//zPf2T/b3/7pr2ssu8UAiI5Ss+D2HNF\nRUX98p7+MKhPQCRH7XweBKDnQUhodCYgksP0PAgJm84EREQiTCEgIhJhCgERkQhTn4CI9DvZmEq6\nra2Nr3/9Mj7+8YOYNeuWtHzmP/6xnptu+jaPPPKjtLS3OwoBEel36urWMPuXcyitKEtLe02xRmaf\nc2OPt51u3LiRtrbWtAXATgUh3xkcegiYWTUwCegAZrr7ipRtVwBfANqAFe7+rbDrEZFoKK0oo2z4\n4Ix93n33VbN2bR1z597K9u3baWiop729nauvvo6xY8dzwQXTOfvs6bz44m8ZOXIUZofxwgvPc+CB\no7n55tt5992/UV19FyUlJRQUFHD77Xd1av+tt95k0aIHKC4uYb/99uP6629My+RyofYJmNkJwHh3\nnwJcBixM2TYYuBb4tLufABxhZhPDrEdEJCxXXnk1Bx44hpEjRzFp0hTmz3+Aa675Dv/+7/OAxPMC\nDj30cB555EesWvUWI0eO5OGHH+Ott96ksbGBWGwzV199PQsWfJ8jj/wky5Yt6dT+ggX/hzvvrGbB\nggcYOrSCF154Pi11h30mMBV4EsDd3zGzoWZW7u4NQAuwAxhiZo1AKZC+yT5ERLJg1aq32Lp1C889\n9ywALS0tu7YddtjhAFRWDmP8+EOSrytpaGigsnIY3//+Qpqbm9m0aSOnnXbmrv1isc3U1tZy443X\nEY/HaW5uZujQirTUG3YI7A+sSFnemFz3rrvvMLPbgP8GtgM/c/d3Q65HRCRUJSUDmDnzeo444hMf\n2VZUVLzb1/F44pf+hRdezLHHTuLxxxfT3Ny0a3txcQlVVVUsXPhg2uvNdMfwri6O5OWgWcB4oB54\nwcyOdPdV3e1cUTGI4uL+OXR+27b0dGBlWkVFGVVVmbvumqvy8fj152MXxvHo7e+rpWUbxcWFTJx4\nDCtWvMpJJ03m3Xff5ZVXXuHiiy+msLCA4cPLKS0tpbi4kGHDEu0VFRVSWTmIxsZ6PvEJY99992Hl\nyjc4+uijqawso7i4iLFjR1BcXMS2bR8wbtw4Fi9ezMSJEznkkEP2+s8VdgisI/HLf6cRwPrk68OA\nv7t7DMDMXgaOAboNgVhse0hlZl8s1pjtEvZILNbIhg312S4j6/Lx+PXnYxeLNdKUxmPSFGvs9e9r\n8+ZG2ts7OOOM6cyZcwvnnz+Djo4OZs68jg0b6unogI0bGxg4sI329g42bWqkpKSe9vYONm/ezrRp\n5/HVr36NUaMOZNq085g37x4mTz6JtrZ2Nmyo59prZ3HttdczYMAAhg0bztSpnw18/HoKr4J4PN7n\nv5CgzGwyMNvdTzezCcD8ZCcwZvYx4BXgyOSloWXAre7+anftbdhQH16xWbZ69Xvc9cL8jN7NsLca\nN9bz7ZNnarZG8u/49fdjl41xArmsqmpwtzeahnom4O6vm9lKM3sVaAeuMLOLgC3u/mszuwd40cxa\ngdd6CgARkaA0lXRwofcJuPusLqtWpWx7GHg47BpERGT3NHeQiEiEKQRERCJMISAiEmEKARGRCFMI\niIhEmEJARCTCFAIiIhGmEBARiTCFgIhIhCkEREQiTCEgIhJhCgERkQhTCIiIRJhCQEQkwhQCIiIR\nphAQEYkwhYCISIQpBEREIkwhICISYQoBEZEIUwiIiESYQkBEJMIUAiIiEaYQEBGJMIWAiEiEKQRE\nRCKsOOgbzWw/YExycbW
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f96bdaba8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Pclass\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that more women survived in all the passenger classes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to put in practice our knowledge about munging and visualisation. We will analyse every feature of the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Age"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We saw that there are 177 missing values of age. We are going this feature with more detail."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f94de9c88>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEDCAYAAAA7jc+ZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFhZJREFUeJzt3X2QZXV95/F323Fm5NJTtOMNjDOxYUvzJRblHyS7JIM6\nOEsgLLA+MInZIAWiKbJFLFeyVuFu8eCY3cqanUkChmwyiCFEq0iyLMloiTgrskFQNGxFLfEbIQaX\nmZFpyNXuaZkHp3v/OHfwMg7T9+HcvqcP79c/c++5Z373U7dPf87p83DP2MLCApKk+nrJqANIkobL\nopekmrPoJanmLHpJqjmLXpJqzqKXpJr7iW5miogzgLuBbZl5S0S8EfgvwCFgH3BZZn4/It4PbAbm\ngS2Z+ekh5ZYkdWnRLfqIOAG4CdjZMXkr8M7M3AQ8BFwVEacCvwJsAC4GtkXEWOmJJUk96WbXzX7g\nAmBPx7RpoNl+PAk8DbwJ+HRmHs7Mp4F/Al5bXlRJUj8WLfrMnM/MA0dNvga4OyIeBV4P/ClwCsUK\n4IhpYG1JOSVJfer3YOzNwJsz82eAB4CrjzGPu20kqQK6Ohh7DK/LzC+2H+8ELgX+N3B6xzzrgN3H\nG2RhYWFhbMz1gST1qKfi7Lfo90TE6Zn5TeBfAv8A3AdcExHXAz8JvDIzv3HcpGNjTE/P9hlh6TSb\nE+Ys0XLIuRwygjnLtpxy9mLRoo+IMynOspkCDkXEZuA3gFsj4iDwz8CVmTkTEduBv6U4vfI3esz+\nonb48GGefPI7A40xM9Og1ZobOMv69a9ifHx84HEkVcOiRZ+Zj1CcUXO01x9j3j8E/rCEXC86Tz75\nHX5r26dY0Vgz0hwH555h6zUXMjV12khzSCpPv7tuNAQrGmtYtfrkUceQVDN+BYIk1ZxFL0k1Z9FL\nUs1Z9JJUcxa9JNWcRS9JNWfRS1LNeR69nmdhYZ5du54c6nt0cwWvV+dK5bHo9TwH51psu7PFisZw\ny/74Gbw6VyqTRa8f4xW6Ur24j16Sas6il6Sas+glqeYsekmqOYtekmquq7NuIuIM4G5gW2beEhE/\nAdwOvBqYATZn5vcj4lLgvcBhYHtm3jak3JKkLi26RR8RJwA3UdwE/IhfB/Zm5lnAncAb2vNdB2yi\nuCPV+yLipPIjS5J60c2um/3ABcCejmkXAx8HyMxbM/OTwFnAw5m5LzP3Aw8AZ5ecV5LUo27uGTsP\nHIiIzsmnAv8mIn6XYgVwNXAKMN0xzzSwtrSkkqS+9Htl7BjwaGZuiYj/DHwA+L/HmGdRzeZEnxGW\n1rBzzsw0hjr+cjM52Rj5sjHq9++WOcu1XHL2ot+i/y7wf9qPPwPcCHySYpfOEeuAhxYbaHp6ts8I\nS6fZnBh6zsW+5OvFptWaG+mysRQ/8zKYs1zLKWcv+j298tMU++0BfhZI4GHg5yJidUScCGwA/rbP\n8SVJJVl0iz4izgS2AlPAoYjYDPwacFNEvAuYBS7PzP0RcS1wLzAP3JiZ1V81SlLNdXMw9hGK0yWP\n9ivHmPcu4K4SckmSSuKVsZJUcxa9JNWcRS9JNWfRS1LNWfSSVHMWvSTVnEUvSTVn0UtSzVn0klRz\nFr0k1ZxFL0k1Z9FLUs1Z9JJUcxa9JNWcRS9JNWfRS1LNdXXP2Ig4A7gb2JaZt3RMPx/4dGa+pP38\nUuC9wGFge2beVn5kSVIvFt2ij4gTgJuAnUdNXwlcC+zumO86YBPFHaneFxEnlR1YktSbbnbd7Ke4\nEfieo6b/J+AjwMH287OAhzNzX2buBx4Azi4rqCSpP4sWfWbOZ+aBzmkR8dPA6zLzf3ZMPgWY7ng+\nDawtJaUkqW9d7aM/hm3Ae9qPx15gnhea/jzN5kSfEZbWsHPOzDSGOv5yMznZGPmyMer375Y5y7Vc\ncvai56KPiFcCAXw8IsaAtRFxH3ADcHHHrOuAhxYbb3p6ttcIS67ZnBh6zlZrbqjjLzet1txIl42l\n+JmXwZzlWk45e9Fr0Y9l5m7gNUcmRMS3M/NNEbEKuDUiVgPzwAaKM3AkSSO0aNFHxJnAVmAKOBQR\nlwBvy8zvtWdZAMjM/RFxLXAvRdHfmJnVXzVKUs0tWvSZ+QjF6ZIv9Pq/6Hh8F3BXOdEkSWXwylhJ\nqjmLXpJqzqKXpJqz6CWp5ix6Sao5i16Sas6il6Sas+glqeYsekmqOYtekmrOopekmrPoJanmLHpJ\nqjmLXpJqzqKXpJqz6CWp5rq6lWBEnAHcDWzLzFsi4qeA24CXAgeBd2Tm3oi4lOL2gYeB7Zl525By\nS5K6tOgWfUScANwE7OyY/CHgf2TmORQrgGva810HbKK4I9X7IuKk0hNLknrSza6b/cAFwJ6Oaf+e\nH90ycBpYA5wFPJyZ+zJzP/AAcHaJWSVJfejmnrHzwIGI6Jz2LEBEvAS4GvggcApF6R8xDawtM6wk\nqXdd7aM/lnbJ3wHszMz7IuLfHTXLWDfjNJsT/UZYUsPOOTPTGOr4y83kZGPky8ao379b5izXcsnZ\ni76LHvgYkJn52+3nu3n+Fvw64KHFBpmenh0gwtJoNieGnrPVmhvq+MvJwsI8X/96jvQzmZxs0Gis\nYXx8fGQZurEUy2YZzFmuXldGfRV9++yaA5m5pWPyl4DtEbEamAc2UJyBI/Xk4FyLbXe2WNF4coQZ\nnmHrNRcyNXXayDJIZVm06CPiTGArMAUciojNwE8C+yPiPmAB+EZm/mZEXAvcS1H0N2Zm9VeNqqQV\njTWsWn3yqGNItdDNwdhHKE6XXFRm3sWPzsaRJFWAV8ZKUs1Z9JJUcxa9JNWcRS9JNWfRS1LNWfSS\nVHMWvSTVnEUvSTVn0UtSzVn0klRzFr0k1ZxFL0k1Z9FLUs1Z9JJUcxa9JNVcV3eYiogzgLuBbZl5\nS0Ssp7hf7EuAPcBlmXmofeep9wKHge2ZeduQckuSurToFn1EnADcBOzsmLwFuDkzNwKPA1e257sO\n2ERxo5L3RcRJ5UeWJPWim103+4ELKLbcjzgH2NF+vAP4ReAs4OHM3JeZ+4EHgLPLiypJ6seiRZ+Z\n85l54KjJjcw81H68F1gLnAxMd8wz3Z4uSRqhMg7GjvU4XZK0hLo6GHsMsxGxsr2lvw7YBezm+Vvw\n64CHFhuo2ZzoM8LSGnbOmZnGUMdX7yYnG8ti+VwOGcGco9Rv0e8ELgE+0f73HuBh4NaIWA3MAxso\nzsA5runp2T4jLJ1mc2LoOVutuaGOr961WnOVXz6XYtksgznL1evKaNGij4gzga3AFHAoIjYDlwK3\nR8RVwBPA7Zl5OCKuBe6lKPobM7P6n5gk1dyiRZ+Zj1CcLnm0844x713AXSXkkiSVxCtjJanmLHpJ\nqjmLXpJqzqKXpJqz6CWp5ix6Sao5i16Sas6il6Sas+glqeYsekmqOYtekmrOopekmrPoJanmLHpJ\nqjmLXpJqzqKXpJrr61aCEdEA/gyYBFYAW4DvAn9EcXepr2bm1WWFlCT1r98t+iuAb2bmJmAz8AfA\n7wHvycw3ACdFxPnlRJQkDaLfon8aWNN+vAZ4BjitfdtBgB3AuQNmkySVoK+iz8w7gamI+BbweeD9\nQKtjlr3A2oHTSZIG1lfRR8SlwBOZ+RpgE/DnR80yNmgwSVI5+joYC5wNfAYgM78WES87aqx1wO5u\nBmo2J/qMsLSGnXNmpjHU8dW7ycnGslg+l0NGMOco9Vv0jwE/D/yviJgCZoFvR8TZmfkF4G3ATd0M\nND0922eEpdNsTgw9Z6s1N9Tx1btWa67yy+dSLJtlMGe5el0Z9Vv0fwzcFhGfB8aBqyhOr/yTiBgD\nvpSZn+tzbElSifoq+sycA95+jJfeOFgcSVLZvDJWkmrOopekmrPoJanmLHpJqjmLXpJqzqKXpJqz\n6CWp5vq9YEqqtYWFeXbtenLUMQBYv/5VjI+PjzqGljGLXjqGg3Mttt3ZYkVjtGV/cO4Ztl5zIVNT\np400h5Y3i156ASsaa1i
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2fc8307438>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Histogram of Age\n",
"# For Series, you can use hist(), plot.hist() or plot(kind='hist')\n",
"df['Age'].hist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the histogram is slightly *right skewed* (*sesgada a la derecha*), so we will replace null values with the median instead of the mean.\n",
"\n",
"In case we have a significant *skewed distribution*, the extreme values in the long tail can have a disproportionately large influence on our model. So, it can be good to transform the variable before building our model to reduce skewness.Taking the natural logarithm or the square root of each point are two simple transformations. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f94d63358>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEDCAYAAADKhpQUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFWdJREFUeJzt3W+MXXd95/H37GzGJteOMjFXxmtvTJZUX4oqHqSslk0g\nNtkoWWQoFLulamRBXbbRKkW7Gy/S9gEJmHbV0o1LCSpNswRMClJg1QWsSClEKamW7KJskRLQhu8S\nw9odO+DBGjr2FGwznn1w7qQTY/uec33PzLln3q8nuX+Oz/3k/vncM79zzu+OLSwsIElqh3+00gEk\nScNjqUtSi1jqktQilroktYilLkktYqlLUov8434LREQH+DQwCUwA+4DvAx8HzgHPZuZddYaUJJVT\nZkv93cC3M/MWYBfwx8AfAe/NzDcCV0fE7fVFlCSVVabUfwhs6F3eAJwArsvMb/RuOwjcWkM2SVJF\nfUs9Mx8BtkbEd4CvAu8DZpYschzYVEs6SVIlfUs9Iu4ADmfmzwG3AH9+3iJjdQSTJFXXd0cpcBPw\nlwCZ+c2IeNl5/24zcOxSK1hYWFgYG7P7JamiysVZptSfB14P/PeI2AqcBL4XETdl5teAdwAfvWSq\nsTGmp09Wzbbsut315hwicw6XOYdnFDJCkbOqMqX+APBQRHwVGAfupDik8c8iYgz4emY+UfmRJUlD\n17fUM3MOeOcF7rp5+HEkSZfDM0olqUUsdUlqEUtdklrEUpekFrHUJalFLHVJahFLXZJaxFKXpBax\n1CWpRSx1SWoRS12SWsRSl6QWsdQlqUUsdUlqEUtdklrEUpekFrHUJalFyvycnVpmfn6eqakjpZff\nsuVaxsfHa0wkaVgs9VVoauoIe/c/ykRnQ99lz8yd4L67d7B163XLkEzS5epb6hGxB9gNLABjwC8C\nbwA+DpwDns3Mu+oMqeGb6Gxg7VUbVzqGpCHrO6aemQ9l5psy8xbgXuAA8BHgvZn5RuDqiLi95pyS\npBKq7ii9B/gD4JWZ+Y3ebQeBW4eaSpI0kNKlHhGvA44A88DMkruOA5uGnEuSNIAqW+rvAT7Vuzy2\n5Paxn11UkrQSqhz9sh347d7lpYdNbAaO9fvH3e76Cg+1clZDztnZTqXlJyc7Az/eang+l5M5h2cU\nMg6iVKlHxCbgZGb+tHf9uYi4MTOfAt4BfLTfOqanT15W0OXQ7a5fFTlnZuYqLz/I462W53O5mHN4\nRiEjDPbFU3ZLfRPF2Pmi/wA8EBFjwNcz84nKjyxJGrpSpd470mXHkuvPATfXFUqSNBjnfpGkFrHU\nJalFLHVJahFLXZJaxFKXpBax1CWpRSx1SWoRS12SWsRfPmqwi/3s3Oxs54Kn+tfxs3MLC+c4enSq\n9PL+9J20siz1BmvCz86dmZth/yMzTHT6F7s/fSetPEu94Zrws3NNyCCpHMfUJalFLHVJahFLXZJa\nxFKXpBax1CWpRSx1SWoRS12SWsRSl6QWKXXyUUTcAbwPOAvcA3wTeJjiS+EFYHdmnq0rpCSpnL5b\n6hFxDUWR3wi8BXg7sA+4PzO3AYeAPXWGlCSVU2b45VbgK5n595n5g8y8E9gOHOzdf7C3jCRphZUZ\nfnkl0ImILwJXAx8Erlwy3HIc2FRPPElSFWVKfQy4BvhlioL/q95tS+/vq9tdXzXbimhSztnZTqXl\nJyc7pfJXXe/lZGjS83kp5hyuUcg5ChkHUabUfwA8lZnngO9GxEngbESsyczTwGbgWL+VTE+fvLyk\ny6DbXd+onBeaM73f8mXyV13voBma9nxejDmHaxRyjkJGGOyLp8yY+peBWyJiLCI2AOuAx4Fdvft3\nAo9VfmRJ0tD1LfXMPAb8N+B/AY8CdwH3Au+KiCeBSeBAnSElSeWUOk49Mx8EHjzv5tuGH0eSdDk8\no1SSWsRSl6QWsdQlqUUsdUlqEUtdklrEUpekFil1SKOGY35+nqmpI6WXP3p0qsY0ktrIUl9GU1NH\n2Lv/USY6G0otf2r6edZ1r685laQ2sdSX2URnA2uv2lhq2dOnTtScRlLbOKYuSS1iqUtSi1jqktQi\nlroktYilLkktYqlLUotY6pLUIpa6JLWIpS5JLWKpS1KL9J0mICK2AZ8HvgWMAc8Cfwg8TPGl8AKw\nOzPP1phTklRC2blfvpqZv7p4JSIeAu7PzL+IiN8D9gAP1BFQ7VR1xkqALVuuZXx8vKZEUjuULfWx\n865vB+7sXT4I7MVSVwVVZ6w8M3eC++7ewdat19WcTBptZUv9NRHxBeAaYB9w5ZLhluPApjrCqd2q\nzFgpqZwypf4d4AOZ+fmI+GfAX533787fir+gbnf9APGWX505Z2c7ta0bYHKyUyp/nTnOz3CxPINk\nKPv/Nwjfn8M1CjlHIeMg+pZ6Zh6j2FFKZn43Ir4PvC4i1mTmaWAzcKzfeqanT15u1tp1u+trzTkz\nM1fbuhfXXyZ/nTmWZrjU8zlIhrL/f1XV/boPizmHZxQywmBfPH0PaYyIX4+Ivb3LrwA2Ap8EdvUW\n2Qk8VvmRJUlDV2b45UvAZyPibcAVFDtInwE+HRG/BRwGDtQXUZJUVpnhl1PAL13grtuGH0eSdDk8\no1SSWsRSl6QWsdQlqUXKnnwkraiFhXMcPTpVenmnFNBqZalrJJyZm2H/IzNMdPoXu1MKaDWz1DUy\nnFZA6s8xdUlqEUtdklrEUpekFrHUJalFLHVJahGPftHQnH8s+exs56JT7FY55lxSeZa6hqbKseSn\npp9nXff6ZUglrS6Wuoaq7LHkp0+dWIY00upjqbdEldPoHfqQ2stSbwmHPiSBpd4qDn1I8pBGSWoR\nS12SWqTU8EtErAW+BewDngAepvhCeAHYnZlna0soSSqt7Jb6+4HFgdh9wP2ZuQ04BOypI5gkqbq+\npR4RAbwaeBQYA7YBB3t3HwRurS2dJKmSMlvq9wF3UxQ6QGfJcMtxYFMdwSRJ1V1yTD0idgNPZebh\nYoP9Z4xd6MYL6XbXV4y2MurMOTvbqW3deqnJyU6l19L353CNQs5RyDiIfjtKdwDXRcRbgc3AGeBU\nRKzJzNO9246VeaDp6ZOXFXQ5dLvra815scmtNHwzM3OlX8u6X/dhMefwjEJGGOyL55Klnpm/tng5\nIu4B/h9wI7AL+AywE3is8qNKkmpR5Tj1xaGWe4F3RcSTwCRwYOipJEkDKT1NQGZ+cMnV22rIIkm6\nTJ5RKkktYqlLUotY6pLUIpa6JLWIpS5JLWKpS1KLWOqS1CKWuiS1iKUuSS1iqUtSi1jqktQiped+\nkdpofn6eQ4cOlZ4WecuWaxkfH685lTQ4S12r2tTUEfbuf5SJzoa+y56ZO8F9d+9g69brliGZNBhL\nXaveRGcDa6/auNIxpKFwTF2SWsRSl6QWsdQlqUUsdUlqEUtdklqk79EvEfEy4FPARmAN8LvAM8DD\nFF8KLwC7M/NsfTElSWWU2VJ/K/B0Zm4H3gnsB/YBH8vMbcAhYE9tCSVJpfXdUs/Mzy25ei3wt8A2\n4M7ebQeBvcADQ08nSaqk9MlHEfE1YDPFlvtXlgy3HAc21ZBNklRR6VLPzJsi4rXAZ4CxJXeNXeSf\nvES3u75itJVRZ87Z2U5t69ZLTU52Sr2WVV+Tsuuti5+j4RmFjIMos6P0BuB4Zk5l5rMRMQ6cjIg1\nmXmaYuv9WL/1TE+fvPy0Net219eas+ykUbp8MzNzpV7Lqq9J2fXWoe7357CMQs5RyAiDffGU2VF6\nM8WYORGxEVgHPA7s6t2/E3is8iNLkoauzPDLnwKfiIi/BtYC/xb4G+DhiPgt4DBwoL6IkqSyyhz9\n8hPgjgvcddvw40iSLodnlEpSi1jqktQilroktYilLkktYqlLUov4G6VSSQsL5zh6dKr08lu2XMv4\n+HiNiaSfZalLJZ2Zm2H/IzNMdPoX+5m5E9x39w62br1uGZJJ/8BSlyqY6Gxg7VUbVzqGdFGWulqn\nyjBJleGUuszPzzM1daT
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94d530f0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We see with more bins the distribution\n",
"df['Age'].hist(bins=30, range=(0, df['Age'].max()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we analyse the relationship of Age and Survived."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f2f94c24160>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAFhCAYAAABnFk0rAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xd8XHed7//XVPWuUbVkNesr9+44jp2CE6cQElIg9ABh\n7y4E7mV3L3dZHo/7W2Bh9y6wWdoDdpfdBcKSRgLpDolJd+zEvetry1az+kij3kYz5/fHjGzZURnJ\nIx1p5vN8PPKINad9TuTorW8532MxDAMhhBAiVFazCxBCCLGwSHAIIYSYFgkOIYQQ0yLBIYQQYlok\nOIQQQkyLBIcQQohpsYeyk1LqIWAz4Ae+qrXeP2bbjcB3gRFgp9b6O0qpOOBXQDYQA/y91vpFpdQv\ngfWAO3j497XWO8N1M0IIIWbflMGhlLoWKNNab1FKVQD/BWwZs8uPgJuAJuB1pdSTwCpgn9b6B0qp\nQuAV4MXg/l/XWr+IEEKIBSmUrqrtwNMAWutKIFUplQiglCoG2rXWjVprA9gJbNdaP6G1/kHw+EKg\nPvylCyGEMEMoXVU5wP4xX7uDn1UF/902ZlsrUDL6hVJqN5AP3D5mny8rpf4aaAG+rLXumFnpQggh\nzDCTwXFLqNu01tcAdwK/DX70MIGuqu3AEeBbM7i+EEIIE4XS4mgk0LIYlUdgPGN0W+6YbflAo1Jq\nHdCqtT6vtT6ilLIrpTK11q+N2fdZ4GeTXXhkxGfY7bYQShRCiHlvsl+6F5RQguNl4JvAL4KB0KC1\n7gPQWtcqpZKCA+CNBLqkPhH892LgL5VS2UCC1todHDj/mta6GrgeOD7ZhT2e/pndVRi4XEm0tfWY\ndn2zRON9R+M9g9y3GdeNFFMGh9Z6j1LqQHC8wgc8qJS6H+jUWj8DfBF4DDCAR7XWVUqpfwX+Uyn1\nJhALfCl4up8Cjyul+oBe4HPhvyUhhBCzyTKfl1Vva+sxrTj5bSx6ROM9g9y3CdeNmK4qeXJcCCHE\ntEhwCCGEmBYJDiGEENMiwSGEEGJaJDiEEEJMiwSHEEKIaZHgEEIIMS0SHEIIIaZFgkMIIcS0SHAI\nIYSYFgkOIYQQ0yLBIYQQYlokOIQQQkyLBIcQQohpkeAQQggxLRIcQgghpkWCQwghxLSE8s5xIS7R\n6O7jt6+cpqNnCJ/PT15mAh/fvoTs9HizSxNCzAEJDjEth8+4+ffnTjA47CMlwYnFAkfPtnOq1sNd\n20rYsakAqyVi3pAphBiHBIcI2XunWvi3Z07gsFv5H3csY/OyHAzDYL9u47cva554rYphr487thab\nXaoQYhbJGIcISf/gCI+8chqHw8rffmo9m5flAGCxWNhYkcW3HriKjORYnn67mgO6zeRqhRCzSYJD\nhOTZ3dV093u5/eoiFuckvW97SoKTr9yzEqfDyn88f5Lzbb0mVCmEmAsSHGJKje4+/nTgPK7UWG7e\nVDDhfoXZSXzhg8sY8vr45YuV+A1jDqsUQswVCQ4xpSdeq8LnN/j49nIcdtuk+26oyGLT0iyqm7rZ\nc7x5jioUQswlCQ4xqY7uQY6dbac0P5nVZRkhHfPRG8pw2q387vWzDAyNzHKFQoi5JsEhJvXO8WYM\nYOvKXCwhTrNNT47ltqsX0903zHPv1MxqfUKIuSfBISZkGAa7jzfjsFvZWJE9rWNv2VRIRnIsu/af\np6tveJYqFEKYQYJDTOhsYzctHf2sK3cRHzu9R36cDhu3bS5kxOdn1/76WapQCGEGCQ4xoXeONQFw\nzYqcGR1/zcpckuMdvHrwPP2DMtYhRKSQ4BDj8o74eO9UK6mJTpYVpc/oHE6HjZs2FjAw5OONww1h\nrlAIYRYJDjEuXd9J/9AIm5ZmY7XOfO2pG9bmE+u08fK+erwjvjBWKIQwiwSHGNfJag8AK0pm1toY\nFR/r4Po1+XT1DbO/UpYiESISSHCIcZ2o6cBus1K+KPWKz3X92jwA6a4SIkJIcIj36eobpr61lyWL\nUnA6Jn9SPBRZafEsK0rj9PkuGt19YahQCGGmkOZYKqUeAjYDfuCrWuv9Y7bdCHwXGAF2aq2/o5SK\nA34FZAMxwHe01i8opRYBvyEQWE3Ap7XW3jDejwiDU7UdACwvvrJuqrGuW5PPyRoPbx5p5GPbl4Tt\nvEKIuTdli0MpdS1QprXeAnwB+PFlu/wIuAvYCtyklKoAPgTs01pfD9wHPBTc99vAT7TW1wFngc+H\n4yZEeI2ObywrSgvbOdcuySQp3sHuY00ySC7EAhdKV9V24GkArXUlkKqUSgRQShUD7VrrRq21AewE\ntmutn9Ba/yB4fCEw+gTY9cBzwT8/B9wYlrsQYWMYBidqOkiMc1CY/f7l02fKbrOydWUufYMj7Jf3\ndQixoIUSHDnA2P/T3cHPxtvWCuSOfqGU2g38N/DV4EfxY7qmLtlXzA/nW3vx9AyxdHFa2F8Be+3q\nwCD5O7JqrhAL2kwGxyf7aXLJNq31NcCdwG/H2S4vpp6Hjp4J/B4QzvGNUdnp8RTnJnOypkPWrxJi\nAQtlcLyRiy0MgDwCA9uj28a2GvKBRqXUOqBVa31ea31EKWVTSrmAHqVUjNZ6aHTfyS6clhaPfYr3\nP8wmlyt8XTULReUrpwG4alXerNz/jZsK+cUzxzlV38WHtpWE/fwzFY3fa5D7FjMTSnC8DHwT+EUw\nEBq01n0AWutapVSSUqqQQAjcDnwi+O/FwF8qpbKBRK11m1JqF3AvgRbIPcBLk13Y4+mf2V2FgcuV\nRFtbj2nXN8uZOg9xMXYcGLNy/8sKUrBYYNd7tWyucIX9/DMRrd9rue+5v26kmLKrSmu9BzgQHK/4\nIfCgUup+pdSdwV2+CDwGvAE8qrWuAv4VyFJKvUlgEPxLwX2/CdyvlHoDSAN+Hc6bEVemb9BLQ1sf\nxblJYR/fGJWSGMOyxWmca+ym1cRfDIQQMxfScxxa629c9tGxMdveBrZctv8g8MlxztMM7Jh+mWIu\n1DQFfgsrzk2e1etctSyHEzUe3j3ZwoeuKZ7Vawkhwk+eHBcXVDd1A7MfHOuVC7vNyt6TLbN6HSHE\n7JDgEBfMVXDExdhZWZJOU3u/LEEixAIkwSGAwIN/5xq7yUiJJS0pZtavt14FBsYPnpaHAYVYaCQ4\nBACeniG6+oYpLwzfMiOTWVOWic1q4YAEhxALjgSHAKA6ODC+pODKl1EPRXysg6WL06ht7sHdNTAn\n1xRChIcEhwAujm+UF8xNiwNg3Wh3laxdJcSCIsEhgIvBUTZHLQ6AtUtcWEC6q4RYYCQ4BIZhUN/a\nS1ZqHAlxjjm7bkqCkyWLUqg630VX79CcXVcIcWUkOATdfcP0DnjJdyXM+bXXlrswgCNn2+f82kKI\nmZHgEJxvCzxLke9KnPNrrynLBOBIlXvOry2EmBkJDkFDWy8Ai0xocWSnx5OTHs/JGo+8GVCIBUKC\nQ1xocSwyocUBsLosgyGvj8q6TlOuL4SYHgkOQYO7F7vNQlZa3Kyc3zAM6nrOc9pzluquWlr7L+2W\nWl0q3VVCLCQhrY4rIpffMGhw95GbkYDdFt7fI7qGunnz/DvsazlM+2DHJduKkxdz3aItrM1aSdmi\nFOJj7BypcvPJm8qxzNKS7kKI8JDgiHLuzgGGvf6wz6g67j7Fw6cep8/bj9PmZGP2WlxxGXj9IzT0\nNXGq/TTVJ2t5pe51/nzl/awoSee9U600tPWxKMucLjMhRGgkOKJcuMc3/IafP1S9wKv1b2G32rl3\nyR1sydtEjM15yX6t/W5eqvkT7zYf4Hv7f8KWwg/CKThc5ZbgEGKekzGOKDc6oyo/88pbHIZh8OSZ\n53i1/i2y4118bf2XuaFg6/tCAyArPpPPLLuP+8rvon9kgD91PoUtxc3xc/I8hxDznQRHlAtni2NX\n3Ru8cX43uQnZ/O/1D7IoKW/KY65ddDUPrn4ACxCz5ChV7ib6B0euuBYhxOyR4IhyDe4+4mJspCdf\n2Ts49rcc5umzL5Iak8K
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94c21940>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Now we visualise age and survived to see if there is some relationship\n",
"sns.FacetGrid(df, hue=\"Survived\", size=5).map(sns.kdeplot, \"Age\").add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do no observe significant differences."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f2f94ba0d30>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFyxJREFUeJzt3XucXGV9x/HPZiEhbDbmtkIMEKjiTxHbSvsqmqgJgYJ5\n4Q2IL6yYctEmtXi31tuLW6AgKKAgIERDQ5SLWoqm1AgRglKulZdcVH5yDbmSDVnY3QDZZWf6xzkL\nk2XZnTnnmd1nZr7vfzJz5pzf/HJmnv3Nec5zntNULBYRERGJzZjRTkBERGQwKlAiIhIlFSgREYmS\nCpSIiERJBUpERKKkAiUiIlHaZbQTaDRmNh/4KvASMAF4HFjs7p054x4PjHH3K3PG+S3wDXf/TYZt\nDwYuAHqB7cA/uvszefKRxlLP7SPd/qvAEuAt7v54nlwagQrUCDKzXYEVwAHuviVddg7wCeDCPLHd\nfXn+DHO7Eljo7r8zsy8AZwOLRzknqRH13j7M7Gvpww2jmkgNUYEaWeOB3YFWYAuAu/d/aTGzJ4BD\n3f1xM5sDnOXu7zGzW4HfA38N3At0uPs56TbfSOO9QPJ5jhvk9QnAqcAlwBvT9a9x9wvNbDxwLTAN\neDTdfidm9g/AIqD/qu4mYJO7f6xknZnAbu7+u3TRT4C7cuwraTx12z5SF7t7t5ktyrGPGooK1Ahy\n904zOx34vZndCawBfubuf05XGTitR+nzLnc/xMz+ClgGnJMuPxb4KPCRdP0fkRzJDHz9c8AGd19k\nZmOAu8zsZuBdwPPuPtvM9gSeGCTva4BrhvnvvQHYXPJ8M7DnMNuIvKzO2wfu3j3cOrIzDZIYYe5+\nHrAP8ENgJklD6O8Gaxpi0zvS7e8HxprZvmb2VqDX3f9YEv+B13j9EOCo9Nfmr0l+Cb4JeDtwe7rt\nZuDhQP/VJl79B0VkSA3UPqQMOoIaYWY23t07gOuA68zsp8C3gcvZ+Q/62AGb9pQ8vprkF2ELyS/C\ngQZ7fQewxN2vH5DPoUChZFHzIDmX04WxDphR8vwNqK9dKlTH7UMyUIEaQWZ2OHCemb275HD/jSR9\n2wCdwN4kI5fmDRHqGuAqkv76I8t8/XaS7ozr0y6MbwFnAn8EZgGXmdnegA0MVk4XhruvN7NtZvYu\nd78T+Djwi6G2ESlVz+1DslEX3why95uApcCvzewWM1tD0rVwcrrK+cAyM/sfoLS/ujggzpPpsi3u\n/vQg7zPY65cAXWZ2B0l3SIe7P0syamqamd1G0iDvzvFfPAH4lpn9Bvg7khPPImWp9/ZhZpekXYh7\nAD9Oz3HJEJrKud2GmR0I3ABc4O6Xpr8klgG7khxaf9zdt5jZcSQnG/uApe6+rHqpi4hIPRv2CMrM\ndgcuAlaXLD4T+L67zyUpXF9M1zuF5ND7EOALZjYpeMYiItIQyuniexGYD2wqWfYpoP9kYjswFTgY\nuMfdu939RZI+3dkBcxURkQYy7CAJdy8AO8ysdNkLAOnJxJOBM0iueWkv2bQdmB4yWRERaRyZR/Gl\nxWkFsNrdb02HWpYa6poFAIrFYrGpadjVRGpJsC+02ofUoYq+0HmGmV8JuLuflT7fyM5HTDOAO4cK\n0NTURHt7V44UXltbW2tVYlcrbq3GVs6vjh2K2kftx67FnKsZu9L2kalApaP1drj7kpLFdwNLzWwi\nyYVts0hG9ImIiFRs2AJlZgeRXH8wE+g1swXA64EX0zH9ReCP7v7pdCr5m0gK1OnuXp3yLiIida+c\nQRL3kQwbH1Y6Tcj1w64oIiIyDM0kISIiUVKBEhGRKKlAiYhIlFSgREQkSipQIiISJRUoERGJkgqU\niIhESXfUDaSvr4/165/KFWPKlLcFykZEpPapQAWyfv1TfPaKVYxtnZZp+56uraz4SgsTJ74+cGYi\nIrVJBSqgsa3TGDdpj9FOQ0SkLugclIiIREkFSkREoqQCJSIiUVKBEhGRKKlAiYhIlFSgREQkSmUN\nMzezA4EbgAvc/VIz2wtYQVLgNgEL3b03vRX854A+YKm7L6tS3iIiUueGPYIys92Bi4DVJYuXABe7\n+xzgMeCkdL1TgHkkd+D9gplNCp+yiIg0gnK6+F4E5pMcKfWbC6xMH68E/h44GLjH3bvd/UXgdmB2\nuFRFRKSRDFug3L3g7jsGLG5x99708RZgOrAH0F6yTnu6XEREpGIhpjpqqnD5TtraWgOkkC92X18f\nTz75ZFnrdnZuGXR5d/e2ctMaUgz7I5a41YxdzZxD0r6t/di1mHO1Y5cra4HqMrNx6ZHVDGADsJGd\nj5hmAHcOF6i9vStjCkNra2stO/batU/kmugVYPumR2iZvn/m7fvFsD9iiFvN2NXOOSTt29qOXYs5\nVzN2pe0ja4FaDRwDXJ3+uwq4B/iBmU0ECsAskhF9NSHvRK89nVsDZiMiIsMWKDM7CDgfmAn0mtkC\n4DhguZktBtYCy929z8y+CtxEUqBOd/fqlHcREal7wxYod7+PZNj4QIcPsu71wPUB8hIRkQanmSRE\nRCRKKlAiIhIlFSgREYmSCpSIiERJBUpERKKkAiUiIlFSgRIRkSipQImISJRUoEREJEoqUCIiEiUV\nKBERiZIKlIiIREkFSkREoqQCJSIiUVKBEhGRKKlAiYhIlDLd8t3MWoCrgMnAWGAJsBm4jORuug+4\n+8mhkhQRkcaT9QjqBOBhd58HLAC+C1wIfMbd3wNMMrMjwqQoIiKNKGuB2gpMTR9PBZ4B9ktvDw+w\nEjgsZ24iItLAMhUod78OmGlmjwBrgC8DHSWrbAGm585OREQaVtZzUMcBa919vpm9HbgBeLZklaZy\nY7W1tWZJIWjszs6WquVQqRj2Ryxxqxm7mjmHpH1b+7FrMedqxy5XpgIFzAZ+BeDuD5rZ+AGxZgAb\nywnU3t6VMYWhtbW1lh27o2N7VXLIIob9EUPcasauds4had/WduxazLmasSttH1nPQT0KvBPAzGYC\nXcCfzGx2+vrRwKqMsUVERDIfQV0OLDOzNUAzsJhkmPkVZtYE3O3ut4RJUUREGlGmAuXu24FjB3np\nvfnSERERSWgmCRERiZIKlIiIREkFSkREoqQCJSIiUVKBEhGRKKlAiYhIlFSgREQkSipQIiISJRUo\nERGJkgqUiIhESQVKRESipAIlIiJRUoESEZEoZb3dhohIJn19faxf/1TF23V2trzq5qJ77bUPzc3N\noVKTyKhARaJYKLBu3TomTMh3d181WInd+vVP8dkrVjG2dVquOD1dW7lo0fuYOXO/QJlJbFSgItHb\nvY2vX31PrkarBivVVM6Rz2BHOQNt2LCesa3TGDdpj5DpSR3KXKDM7Djgy0AvcCrwILCC5LzWJmCh\nu/eGSLJRqNFKzEId+Wzf9Agt0/cPlJXUs0wFysymkBSldwCtwBLgI8DF7n69mf07cBLJreFFpE6E\n+BHV07k1UDZS77KO4jsMuNndn3f3p919MTAXWJm+vjJdR0REJJOsXXz7Ai1m9nNgEnAGsHtJl94W\nYHr+9EREpFFlLVBNwBTgKJJidWu6rPT1srS1tWZMIVzszs6WquUw0iZPbnnN/3e19nUMn2EscUOL\nad/G2E6G+r5Xqha/a7Uau1xZC9TTwB3uXgAeN7MuoNfMxrn7DmAGsLGcQO3tXRlTGFpbW2vZsYcb\ndVRLOjq2D/r/rmR/VKJacasZu9o5hxTTvo2xnbzW971Stfpdq7XYlbaPrOegbgLmmVmTmU0FJgCr\ngQXp68cAqzLGFhERyVag3H0j8DPgLuBG4GTgNOB4M7sNmAwsD5WkiIg0nszXQbn7UmDpgMWH50tH\nREQkocliRUQkSipQIiISJRUoERGJkgqUiIhESbOZi0hNKhYKbNiwPkisKVPeFiSOhKUCJSI1qbd7\nG+feuI2xrZtzxenp2sqKr7QwceLrA2UmoahAiUjN0i1q6pvOQYmISJRUoEREJEoqUCIiEiUVKBER\niZIKlIiIREkFSkREoqR
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94ba0940>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We plot the histogram per age\n",
"g = sns.FacetGrid(df, col='Survived')\n",
"g.map(plt.hist, \"Age\", color=\"steelblue\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that non survived is left skewed. Most children survived."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f94ba0b38>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f94a86358>], dtype=object)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEGCAYAAABlxeIAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFZdJREFUeJzt3X+Q3PV93/HnWSAIK8mS1TNQKcgk2O9M6yZTph270ACW\nCYTitGnAk04psUMyYRKndRFxxp3WYOS202IjOzhOYyvGJcTu2Mn4l6wWEwImwTh2UzI1nob3BAyS\n9QN0Pp99ugv64bvrH9/v4ZOQdLvf/e7ud1fPxwyj2+/u973v3dsPr/t+v5/vd8cWFhaQJOllg25A\nktQMBoIkCTAQJEklA0GSBBgIkqSSgSBJAuCMQTegU4uIzcB7gRawC/jFzNw32K6kwYiIM4D/CtwC\nbHQs1MsthAaLiHOA/wHclJk/BnwB+PBgu5IG6nPANOAJVD1gIDTbZuDpzPy/5e17gKsiojXAnqRB\n2pqZdwBjg25kFBkIzfYa4OnFG5k5C0wCFw2sI2mAMvOrg+5hlBkIzXYOcOi4ZS9QHE+QpFoZCM02\nC5x93LJzgJkB9CJpxBkIzfYk8OrFGxHxcmAt8NcD60jSyDIQmu1h4IKIuKS8fQvwhcx8YYA9SRpR\nY17+utki4jLgbopdRU8Bb83MA4PtSuq/iHgl8Eh5c3HCxfeBN2bm/oE1NkLaCoSIeC3wWWBbZv5O\nRPwwxRTIM4EjwL/KzAMRcQPwdmAO2J6Z9/SudUlSnZbdZVSeHHU38OCSxe8Bfjczr6AIii3l495F\nMXf+DcAtEbG29o4lST3RzjGEQ8A1wNJNsl8FPl3+PAGsB14HfC0zZzLzEPAocGmNvUqSemjZaxll\n5jxwOCKWLnsBICJeBrwNuAM4jyIcFk0A59fZrCSpdyrPMirD4D7gwcx8+AQP8dRySRoi3Vzt9GNA\nZuZ/LG/v49gtgg3AV05VYGFhYWFszNxQ7YbuQ+VYUA+1/cGqFAjlbKLDmbl1yeKvAtsjYg0wD1xC\nMePo5F2OjTExcbBKCy8xPr66cbWa2NPpUGt8fHUN3fTXqI+FptZqYk+9qNWuZQMhIi4G7gI2AUcj\n4nrglcChiHiY4jK0/y8zfz0i3gk8QBEI787Mel6RJKnn2jmo/DjFNNJlZean+cHsI0nSEPHSFZIk\nwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUDQZIEGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJKA7r4g\n57QxNzfHnj27O15verpFq7WeFStW9KArSaqXgdCGPXt2c+u2naxsre9ovSOzk9y15Vo2bbqwR51J\nUn0MhDatbK3n7DXnDroNSeoZjyFIkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVDARJEmAgSJJKBoIk\nCTAQJEklA0GSBLR5LaOIeC3wWWBbZv5ORGwE7qMIlP3AjZl5NCJuAN4OzAHbM/OeHvUtSarZslsI\nEXEOcDfw4JLFW4EPZublwNPATeXj3gVsBt4A3BIRa+tvWZLUC+3sMjoEXEOxJbDoCmBH+fMO4KeA\n1wFfy8yZzDwEPApcWl+rkqReWjYQMnM+Mw8ft7iVmUfLnw8A5wPnAhNLHjNRLpckDYE6DiqPdbhc\nktRAVb8g52BEnFVuOWwA9gL7OHaLYAPwleUKjY+vrthC57Xm5uZ49tlnl60zPX3gmNszM9+p3NO6\nda3aXmM/36tRqDVsmvo+jnqtJvZUd612VQ2EB4HrgE+U/94PfA34vYhYA8wDl1DMODqliYmDFVs4\n1vj46mVr7dr1TKWvwpyZeIpV4xdV6mtqaraW19jO67PWD+oMo6a9j6dDrSb21Ita7Vo2ECLiYuAu\nYBNwNCKuB24A7o2Im4FdwL2ZORcR7wQeoAiEd2dmPa+oRlW+CvPwzGSPupGk5lg2EDLzcYpppMe7\n6gSP/TTw6Rr6kiT1mWcqS5IAA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJ\nJQNBkgQYCJKkkoEgSQIMBElSyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVDARJ\nEmAgSJJKBoIkCTAQJEklA0GSBMAZVVaKiBbw+8A6YCWwFXgO+G/APPD1zHxbXU1Kknqv6hbCW4En\nM3MzcD3wW8D7gX+dmT8JrI2Iq+tpUZLUD1UD4dvA+vLn9cAkcGFmPl4u2wFc2WVvkqQ+qhQImflJ\nYFNE/DXwJeAdwNSShxwAzu+6O0lS31QKhIi4AdiVma8GNgN/cNxDxrptTJLUX5UOKgOXAl8EyMwn\nIuKHjqu1AdjXTqHx8dUVW+i81vR0q7bnate6da3aXmM/36tRqDVsmvo+jnqtJvZUd612VQ2Ep4DX\nA5+JiE3AQeCZiLg0M78M/BxwdzuFJiYOVmzhWOPjq5etNTU1W8tzdWJqaraW19jO67PWD+oMo6a9\nj6dDrSb21Ita7aoaCB8G7omILwErgJsppp1+JCLGgK9m5kMVa0uSBqBSIGTmLPDzJ7jrsu7akSQN\nimcqS5IAA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEg\nSQIMBElSyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVzhh0A5KG09zcHHv27G77\n8dPTLaamZl+8vXHjBaxYsaIXrakiA6GHFhbm2bt3T6V1HSxquj17dnPrtp2sbK3veN0js5PcteVa\nNm26sAedqSoDoYeOzE6x7ZNTrGx1FgoOFg2Lla31nL3m3EG3oZoYCD3mgFGTLbfb5/jdPEtV3fpV\ncxkI0mmsm90+MxNPsWr8oh50pUGpHAgRcQPwDuAocBvwBHAfxcyl/cCNmXm0jiYl9U7VrdjDM5M9\n6EaDVGnaaUS8giIELgHeBPwssBX4YGZeDjwN3FRXk5Kk3qt6HsKVwB9n5t9k5vOZeTNwBbCjvH9H\n+RhJ0pCousvoVUArIj4HrAXuAM5ZsovoAHB+9+1JkvqlaiCMAa8A/jlFODxcLlt6vyRpiFQNhOeB\nxzJzHvhmRBwEjkbEWZl5GNgA7Gun0Pj46ootdF5rerpV23P12rp1rZe8nn6+V6NQa9gM4n0c5Jg4\n0We8E3W9X039/A5iLFQNhAeAj0XEnRRbCquA+4HrgY8D15W3lzUxcbBiC8caH1+9bK2Tzaduoqmp\n2WNeTzuvr12jXmtYQ2UQ7+Mgx8Txn/FO1PlZadrntxe12lXpoHJm7gP+CPhzYCfwNuB24C0R8Qiw\nDri3Sm1J0mBUPg8hM7cD249bfFV37UiSBsXLX0uSAANBklQyECRJgIEgSSoZCJIkwECQJJUMBEkS\nYCBIkkoGgiQJ8Cs0JQ3AwsJ8V9/J/IpX/N0au9EiA0FS3x2ZnWLbJ6dY2eo8FI7MTnLPe1qsWfPK\nHnR2ejMQJA1E1e9yVu94DEGSBBgIkqSSgSBJAgwESVLJQJAkAQaCJKlkIEiSAANBklQyECRJgIEg\nSSp56YoGOtGFv6anW0xNzZ5yvY0bL2DFihW9bE3SCDMQGqjKhb+OzE5y15Zr2bTpwh52JmmUGQgN\n5YW/JPWbxxAkSYCBIEkqdbXLKCLOBr4BbAUeAu6jCJn9wI2ZebTrDiVJfdHtFsK7gMny563ABzPz\ncuBp4KYua0uS+qhyIEREAD8G7ATGgMuBHeXdO4Aru+5OktQ33Wwh3AVsoQgDgNaSXUQHgPO7aUyS\n1F+VAiEibgQey8xdJ3nI2EmWS5IaqupB5WuBCyPiZ4ANwBFgJiLOyszD5bJ97RQaH19dsYXOa01P\nt2p7riZat67V9vvZz/d9ULWGzSDex2EeE3W9X039/A5iLFQKhMz8F4s/R8RtwLPAJcD1wMeB64D7\n26k1MXGwSgsvMT6+etlay136YdhNTc229X628161q4m1hjVUBvE+DvOYqOuz0rTPby9qtauOM5UX\ndw/dDtwXEb8C7ALuraG
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94b1f080>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Alternative to Seaborn with matplotlib integrated in pandas\n",
"df.hist(column='Age', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f94accba8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f9492acc0>], dtype=object)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEGCAYAAACNaZVuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFAtJREFUeJzt3X+Q3HV9x/HnEUggm4v50TVQMmQQ9N1ax5mi03bQQvwx\nUAtqW1A7OtSCTrUWx0raqjMdFeq0FiRj0TJSREVGO6AdFURQsVRbRq2/RmSmvEeCQzhAcoSLuRzk\nB8n1j93ImUnudvf2u/u9+zwf/2R/3Ge/r/vufl7Z+36/+92R6elpJEllOWrYASRJg2f5S1KBLH9J\nKpDlL0kFsvwlqUCWvyQV6OhhB9DTIuKlwBVAA3gAuDAzHx5uKmnwIuJo4J+BdwLrnQf95zv/moiI\n5cC/Axdl5m8AXwauGW4qaWi+BOwE/CBSRSz/+ngpsCUzf9y+/gngrIhoDDGTNCyXZealwMiwgyxW\nln99PAfYcvBKZk4B24FTh5ZIGpLM/O6wMyx2ln99LAd2H3Lbk7S2/0tSX1n+9TEFHHvIbcuBXUPI\nImmRs/zr417g2QevRMQzgFXAT4eWSNKiZfnXx53ASRFxevv6O4EvZ+aTQ8wkaZEa8ZTO9RERZwBX\n0drccx/w55m5bbippMGKiGcC32xfPXggxFPAyzLzkaEFW2TmLP+IOA74FLAOWAZ8ADgfeAHwWPvH\nrsjM26qLKUnqp04+4ftK4HuZ+aGIOAn4OnAX8O7M/Eql6SRJlZiz/DPzphlXTwIebF/2wxeStEB1\nvM0/Iu4CTgTOBTbx9GagR4GLM/PxqkJKkvqr46N9MvNFwKuAzwCfprXZ52XAj4FLq4knSarCnJt9\nIuI0YFtmjmXm3e2z7f0kMw/u7L0ZuHq2x5ienp4eGXErkSqx4F5YzgdVpKsXVSc7fM8ANgDvjIh1\nwArgmoj4m8z8GbARuGfWRCMjjI9PdpNrIJrNUXN1qI6ZoJVroanjfKjz81u3XHXMBN3PhU7K/2PA\ndRHxLVqnH3gbrVMO3BgRU+3LF3aZU5I0RJ0c7bMbeMNh7vqd/seRJA2Cp3eQpAJZ/pJUIMtfkgpk\n+UtSgSx/SSqQ5S9JBbL8JalAnXzIS5q3/fv3Mza2tetx69efxJIlSypIJJXN8tdAjI1tZdPmW1na\nWNvxmL1T27nyknPYsOHkCpNJZbL8NTBLG2s5duW6YceQhNv8JalIlr8kFcjyl6QCWf6SVCDLX5IK\nZPlLUoEsf0kqkOUvSQWy/CWpQJa/JBXI8pekAln+klSgOU/sFhHHAZ8C1gHLgA8APwZuoPWfxyPA\nBZm5r7qYkqR+6uSd/yuB72XmRuB1wGbgMuCjmXkmsAW4qLKEkqS+m/Odf2beNOPqScCDwJnAW9q3\n3QJsAq7pezpJUiU6Pp9/RNwFnEjrL4Gvz9jMsw04oYJskqSKdFz+mfmiiHg+8BlgZMZdI0cY8iua\nzdEuow2GuTo3n0w7dzZ6Grd6daOW62K+6vg71TET1DNXHTN1q5MdvqcB2zJzLDPvjoglwGRELMvM\nPbT+Gnh4rscZH5+cf9o+azZHzdWh+WaamJjqedxsy12ok3CxPb9VqWOuOmaC7udCJzt8z6C1TZ+I\nWAesAO4Azm/ffx5we1dLlSQNVSebfT4GXBcR3wKOBf4S+AFwQ0T8BfAAcH11ESVJ/dbJ0T67gTcc\n5q6z+h9HkjQIfsJXkgpk+UtSgSx/SSqQ5S9JBbL8JalAlr8kFcjyl6QCWf6SVCDLX5IKZPlLUoEs\nf0kqkOUvSQWy/CWpQJa/JBXI8pekAln+klQgy1+SCmT5S1KBLH9JKpDlL0kFsvwlqUCWvyQV6OhO\nfigiLgdeDCwBPgi8CngB8Fj7R67IzNsqSShJ6rs5yz8iNgLPzczTI2IN8CPgG8C7M/MrFeeTJFWg\nk3f+3wS+2768A2jQ+gtgpKpQkqRqzVn+mTkNPNm++mbgVmA/cHFEXAI8ClycmY9XllKS1FcdbfMH\niIhXAxcCZwEvBLZn5t0R8S7gUuDts41vNkfnk7My5urcfDLt3Nnoadzq1Y1arov5quPvVMdMUM9c\ndczUrU53+J4NvAc4OzMngTtn3H0zcPVcjzE+PtlTwCo1m6Pm6tB8M01MTPU8brblLtRJuNie36rU\nMVcdM0H3c2HOQz0jYiVwOXBuZv6ifdvnI+Lk9o9sBO7pLqYkaZg6eef/OmAtcFNEjADTwCeBGyNi\nCthFa3OQJGmB6GSH77XAtYe564b+x5EkDYKf8JWkAln+klQgy1+SCmT5S1KBLH9JKpDlL0kFsvwl\nqUCWvyQVyPKXpAJZ/pJUIMtfkgpk+UtSgSx/SSqQ5S9JBbL8JalAlr8kFcjyl6QCWf6SVCDLX5IK\nZPlLUoEsf0kqkOUvSQU6upMfiojLgRcDS4APAt8DbqD1n8cjwAWZua+qkJKk/prznX9EbASem5mn\nA68APgxcBnw0M88EtgAXVRlSktRfnWz2+SbwmvblHUADOBO4uX3bLcDL+x9NklSVOTf7ZOY08GT7\n6puAW4GzZ2zm2QacUE08SVIVOtrmDxARr6a1eecs4L4Zd410Mr7ZHO0u2YCYq3PzybRzZ6OncatX\nN2q5Luarjr9THTNBPXPVMVO3Ot3hezbwHlrv+CcjYjIilmXmHuBE4OG5HmN8fHJ+SSvQbI6aq0Pz\nzTQxMdXzuNmWu1An4WJ7fqtSx1x1zATdz4VOdviuBC4Hzs3MX7RvvgM4r335POD2rpYqSRqqTt75\nvw5YC9wUESPANPBG4LqIeAvwAHB9dRElSf3WyQ7fa4FrD3PXWf2PI0kaBD/hK0kFsvwlqUCWvyQV\nyPKXpAJZ/pJUIMtfkgrU8ekdJID9+/czNra163EPPTRWQRpp4el1DgGsX38SS5Ys6UsOy19dGRvb\nyqbNt7K0sbarcbvG72NF89SKUkkLR69zaO/Udq685Bw2bDi5Lzksf3VtaWMtx65c19WYPbu2V5RG\nWnh6mUP95jZ/SSqQ5S9JBbL8JalAlr8kFcjyl6QCWf6SVCDLX5IKZPlLUoEsf0kqkOUvSQWy/CWp\nQJa/JBXI8pekAnV0Vs+IeB7wRWBzZl4dEZ8EXgA81v6RKzLztooySpL6bM7yj4jlwFXAHYfc9e7M\n/EolqSRJlepks89u4BXAIxVnkSQNyJzv/DPzALAnIg696+KI2AQ8ClycmY9XkE9atPbu3cv9P/tZ\n1+OOOgqefepzGBkZqSCVStHrN3l9GtiemXdHxLuAS4G3zzag2RztcVHVMlfnms1Rdu5sDHSZq1c3\narku5qvZHOU7//t93vfxuzh62YruBj8xxn9cHYyO9ne91HU91zHXfDLNZw71cz70VP6ZeeeMqzcD\nV881Znx8spdFVarZHDVXhw5mmpiYGuhyJyamZl0XdSyGToyPT7JjxxMsXb6aY457RldjDxyY5LHH\ndrF7d//y1PE1B/XMNd9M85lDs82HbudCT4d6RsTnI+LgtwhvBO7p5XEkScPRydE+pwFXAhuAfRFx\nPvAR4MaImAJ2ARdWmlKS1Fed7PD9IfCSw9z1hf7HkSQNgp/wlaQCWf6SVCDLX5IKZPlLUoEsf0kq\nkOUvSQWy/CWpQJa/JBXI8pekAln+klQgy1+SCmT5S1KBev0yF0lDMj19gK1bH6DRWN712PXrT2LJ\nkiUVpCrP/v37GRvb2vW4hx4aqyBN9yx/aYHZM7WD915zJ0sba7sat3dqO1decg4bNpw89w9rTmNj\nW9m0+daun4dd4/exonlqRak6Z/lLC9DSxlqOXblu2DGK18vzsGfX9orSdMdt/pJUIMtfkgpk+UtS\ngSx/SSqQ5S9JBbL8JalAHR3qGRHPA74IbM7MqyNiPXADrf88HgEuyMx91cWUJPXTnO/8I2I5cBVw\nx4ybLwM+kplnAluAi6qJJ0mqQiebfXYDr6D1Dv+gjcAt7cu3AC/vbyxJUpXmLP/MPJCZew65uTFj\nM8824IS+J5MkVaYfO3x
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94adb518>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We can observe the detail for children\n",
"df[df.Age < 20].hist(column='Age', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.48170731707317072"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Mean of survival for young\n",
"df[df.Age < 20]['Survived'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There were null values, we will recap at the end of this notebook how to manage them.\n",
"\n",
"We are going now to see the distribution of passengers younger than 20 that survived."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9493aac8>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEmCAYAAACtaxGwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFPVJREFUeJzt3Xt0XOV57/GvLMeyLewgG4mbixe5PQ6QnDakLXFoDHHC\nJemC5sIhhRDAuRCKgYLpAZdQMIRACXeoSRooEAjlktOyzIG2KSEB0ksKpQ0ni/Cecgk6MjYWtmzZ\nMhK2rP4xY8c4sjQejWb0ar6ftbw8s/ee/T7L3vPTq3fv/e6GgYEBJEl5mVDrAiRJu87wlqQMGd6S\nlCHDW5IyZHhLUoYMb0nK0MRSNoqIq4BDgUbgSuAY4GDg9eIm30wp/d2oVChJ+jXDhndEHAYckFKa\nGxEzgP8AfghckFJ6ZJTrkyQNopSe9+PAT4uv1wLNFHrgDaNVlCRpaA27codlRHwF+DDQD+wNTAJe\nAxamlNaMSoWSpF9T0pg3QEQcC5wKHAF8EFidUno2Is4HlgBn7uyzmzf3D0yc2DjSWiWp3ux0hKPU\nE5ZHAouBI1NK64Efbbd6GbB0qM93dW0spRmVqLV1Gp2d62tdhjQoj8/KaW2dttN1w14qGBHTgauA\n308prSsu+35E7F/c5DDg5yMvU5JUqlJ63scDM4H7I6IBGABuB+6LiB5gA4XhFElSlezSCctydXau\nd97ZCvLXUo1lHp+V09o6badj3t5hKUkZMrwlKUOGtyRlqOTrvJW3/v5+Ojraq9rmrFn70djo9f3S\naDC860RHRzs/Pe8c2pqaqtLeqr4+uPo6Zs/ef/iNJe0yw7uOtDU1sc/kKbUuQxpTVq5cwRe+8Dnm\nzHkvW7ZsYfPmzZxwwhf4yEcO+7Vtv/GNJRx++Hw+9KFDq1/oDgxvSXVv9uzZ3HjjtwDo7u5mwYIT\nOeSQuUyaNKnGle2c4S1J25k+fTozZ+7Bc8/9nNtu+zYDAwPsuedeXHjhJdu22bixh0su+Rp9fb30\n9vZyzjl/wpw5B3D33XfwxBM/prFxAnPnfoSTTjpl0GWVYHhLqnvb36u4YsWrrFu3joceepDPfe7z\nfPjDv8ctt9zE888/t22b1atXc8wxf8Chh87jmWee5u677+TrX/9z7r33eyxb9g9MmDCBBx/83wCD\nLqsEw1tS3Wtvf4WzzvoqW7ZsoalpMhdddClXXHEpZ599HgCnn16YNHVr+M6YMYM77riVe+65i02b\n3mTKlKkAHH74xzj77NP5+MeP4ogjjh5k2VEVq9nwllT3th/z3qqxsZGBgS2Dbn///X9NW9ueXHTR\npTz//C9YuvQGABYtOp/29ld47LF/ZOHCr3Drrd/dYdlp3Hrrd5kwYeS32HiTjqS6N9gUT+997wH8\n+78/DcBtt32bp5/+t23brlu3jn333ReAJ574EZs2baKnZwN33HEr++03m1NO+RJvf/vb6exc9WvL\nenp6KlKz4S2p7jUMMv3TggVfYdmyv+HMM09jxYpXOfjg39627VFHfYJ77/0e5557Jgce+D7WrFnN\nE0/8mLVr1/LlL5/M2Wf/EQcd9H723HMv1q7t2rbswAPfx7RpO5+je5dqdlbB/JQza9srr7zMyxde\nULXrvF/tfYP9L7/Sm3TqkLMKVo6zCkrSOGN4S1KGDG9JypDhLUkZMrwlKUPepCOprozG3Pa1mLve\n8JZUVzo62ll07cNMap5Zkf292bOaa8795LCXxb700gssXnwexx9/Ip/+9HEjbtfwllR3JjXPZPL0\nPavWXm9vL9dffzUf/ODvVGyfjnlL0iibNGkSV199IzNn7lGxfRrekjTKJkyYUPEHOxjekpQhw1uS\nMuQJS0l1582e1TXbV6UmAzS8JdWVWbP245pzP1nxfQ4lpee5+ebrWLlyJRMnNvL4449x+eXfHNH0\nsIa3pLrS2NhY9amKI+Zw003frug+HfOWpAwZ3pKUIcNbkjJkeEtShgxvScqQV5tIqitOCStJGero\naGfxw0uYPGNqRfbXu2YjV3zy4mEvP1y69AaeffZn9Pf38/nPn8K8eYePqN2SwjsirgIOBRqBK4Gn\ngLsoDLusAE5KKW0aUSWSVCWTZ0xlSttuVWvvmWee5pe/fJlvfeuv6O5ex6mnnjji8B52zDsiDgMO\nSCnNBY4GrgcuBW5OKc0DXgQWjKgKSRrHfuu3Duayy64EYLfdptHX1zvi2+RLOWH5OLD1sQ9rgWZg\nHrCsuOwh4GMjqkKSxrGGhgaamiYD8NBDD3LIIR+moaFhRPscdtgkpTQAvFF8+0XgYeDI7YZJVgF7\nj6gKSaoDTz75Yx555CGuu+7mEe+r5BOWEXEsheGRI4AXtls17I+PlpapTJxY3TOx411r665NaNPd\n3czLo1TLzrS0NO9ynRofxvL/e3d3c8X3Wcqx/uSTT3LvvXdx5523j2hCqq1KPWF5JLCYQo97fUSs\nj4imlFIfsC/w6lCf7+raOOJC9SutrdPo7Fy/S5/p6uoZpWqGbnNX61T+yjk+q6mrq4feNZXLpN41\nG4c91nt6NnDFFX/ODTfcQm8v9PaW9u8z1A+EYcM7IqYDVwHzU0rriosfBT4D3FP8++9LqkSSamzW\nrP244pMXV3yfQ/nhD/+R7u51/NmfXcDAwAANDQ187WtLaGsr/yHIpfS8jwdmAvdHRAMwAJwM3BYR\npwGvAHeWXYEkVVEtpoQ95phPccwxn6roPks5Yfkd4DuDrDqiopVIkkrm3CaSlCHDW5IyZHhLUoYM\nb0nKkLMKSqorTgkrSRnq6Gjnp+edQ1tTU0X2t6qvD66+bsjLD/v6ern88iWsWbOaTZs2cfLJX2Tu\n3ENH1K7hLanutDU1sc/kKVVr7yc/eZI5cw7ghBNOYuXKlZxzzh8Z3pI01s2f//Ftr197bSVtbXuN\neJ+GtyRVyemnL6Czs5OrrrpuxPvyahNJqpJbbvkrrrjiGpYsuWjE+zK8JWmUpfQ8q1a9BsC73/0e\n+vv7Wbt27Yj26bCJpLqzqq+vovsabpqrn/3sGVauXMFZZy1izZrV9Pa+we677z6idg1vSXVl1qz9\n4OqRjzlvtT/DTwl77LGf4corL+OMM77Mm2/2sWjR+SNu1/CWVFdqMSVsU1MTF1/89Yru0zFvScqQ\n4S1JGTK8JSlDhrckZcjwlqQMGd6SlCHDW5IyZHhLUoYMb0nKkOEtSRkyvCUpQ4a3JGXI8JakDBne\nkpQhw1uSMmR4S1KGfBhDjfT399PR0V7WZ7u7m+nq6tmlzyxf3lFWW5LGJsO7Rjo62ln88BImz5ha\nlfbWvbSaM6vSkqRqMLxraPKMqUxp260qbfWu2Qi8UZW2JI0+x7wlKUOGtyRlqKRhk4g4CHgQuDal\ntDQibgcOBl4vbvLNlNLfjVKNkqQdDBveETEVuBF4dIdVF6SUHhmVqiRJQypl2KQXOBpYMcq1SJJK\nNGzPO6W0BeiLiB1XLYyIRcBrwMKU0ppRqE+SNIhyT1h+l8KwyXzgZ8CSypUkSRpOWdd5p5R+tN3b\nZcDSobZvaZnKxImN5TQ1bnV3N9e6hFHX0tJMa+u0WpehGvD/ffSVFd4R8X3gT1JKLwOHAT8favuu\nro3lNDOu7ert7Tnq6uqhs3N9rctQlbW2TvP/vUKG+iFYytUmHwCuAWYDmyLis8BNwH0R0QNsAE6t\nTKmSpFKUcsLyGeDwQVb9beXLkSSVwjssJSlDhrckZcjwlqQMGd6SlCHDW5IyZHhLUoYMb0nKkOEt\nSRkyvCUpQ4a3JGXI8JakDBnekpQhw1uSMlTWfN7jUX9/Px0d7VVrb/nyjqq1JWn8MbyLOjraWXTt\nw0xqnlmV9jZ0vsAe86vSlKRxyPDezqTmmUyevmdV2urbsBpYUZW2JI0/jnlLUoYMb0nKkOEtSRky\nvCUpQ4a3JGXI8JakDHmpoKSaq/ZNcrNm7UdjY2PV2hsNhrekmuvoaOen551DW1PTqLe1qq8Prr6O\n2bP3H/W2RpPhLWlMaGt
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94aca438>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Pclass']).plot(kind='bar')"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f94834d30>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAE4CAYAAACUt3JbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFqlJREFUeJzt3X+UZ3V93/HnOMMu7LDouBkBXdkSmnn3WONpJfEH0rCA\nARELpgikSQgV2pgiORgIVhpNEHM8FmWjGAhK1RA0LcJRBIGoREptk1QTLJaT9B1FZDNgYIGBWQf5\nNUz/uHfLOM7Oj+/c74/P3efjnDnMfO/9fu/7PbO8vvf7ufd+7tDc3BySpLI8r98FSJJWz/CWpAIZ\n3pJUIMNbkgpkeEtSgQxvSSrQyEpWioiLgcOBYeADwAnAocBD9SofzMxbulKhJOnHLBveEbEVeFlm\nHhYRLwS+CfwZ8K7MvLnL9UmSFrGSPe/bgf9Vf/8oMEq1Bz7UraIkSUsbWs0VlhHxa8DrgFngQGAd\n8ABwdmY+0pUKJUk/ZsUHLCPiROCtwNnA1cB/yMyjgTuB93anPEnSYlZ6wPJY4ALg2MzcCdw2b/EN\nwOVLPf+ZZ2bnRkaGOy5SkvZQux2eXskBy/2Ai4GjM/Ox+rHrgPMz8x5gK3DXUq8xNfX4aopds/Hx\njezYsbOn2+wl+yub/ZWr172Nj2/c7bKV7HmfCmwCPhsRQ8Ac8CngmoiYAX5ANZwiSeqRZcM7M68E\nrlxk0dXNlyNJWgmvsJSkAhneklQgw1uSCmR4S1KBVnSet7Ras7OzTE5u7+i509OjTE3NrPp5mzcf\nxPCw1xNoz2B4qysmJ7dz3rabWDe6qSfbe2rmYS4593i2bDm4J9uT+s3wVtesG93E3vvt3+8ypFZy\nzFuSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8\nJalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uS\nCmR4S1KBDG9JKpDhLUkFMrwlqUAjK1kpIi4GDgeGgQ8A3wCupgr/7wOnZebT3SpSkvSjlt3zjoit\nwMsy8zDgOODDwEXAH2TmEcDdwBndLFKS9KNWMmxyO3By/f2jwChwBHBD/diNwOubL02StDvLDptk\n5hzww/rHM4GbgGPnDZM8CBzYnfIkSYtZ0Zg3QEScSDU8cgzwnXmLhpZ77tjYBkZGhldf3RqMj2/s\n6fZ6bdD7m54e7fk2x8ZGB/73skspdXaqzf0NSm8rPWB5LHAB1R73zojYGRHrM/NJ4CXA/Us9f2rq\n8bVXugrj4xvZsWNnT7fZSyX0NzU105dtDvrvBcr4+61Fm/vrdW9LvVGs5IDlfsDFwJsy87H64VuB\nk+rvTwL+dI01SpJWYSV73qcCm4DPRsQQMAecDnwiIt4G3Atc1b0SJUkLreSA5ZXAlYssOqb5ciRJ\nK+EVlpJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kq\nkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ\n3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAI/0uYE81OzvL5OT2jp47PT3K1NTMqp+3efNBDA8P\nd7RNSYPF8O6TycntnLftJtaNburJ9p6aeZhLzj2eLVsO7sn2JHWX4d1H60Y3sfd++/e7DEkFcsxb\nkgpkeEtSgVY0bBIRLweuB7Zl5uUR8SngUOChepUPZuYtXapRkrTAsuEdERuAS4FbFyx6V2be3JWq\nJElLWsmwyRPAccD3u1yLJGmFlt3zzsxngScjYuGisyPiPOAB4OzMfKQL9UmSFtHpAcs/pho2ORq4\nE3hvcyVJkpbT0XnemXnbvB9vAC5fav2xsQ2MjPT2yr7x8Y093d5qTU+P9nybY2OjPfu9tL2/tSql\nzk61ub9B6a2j8I6I64DzM/MeYCtw11LrT0093slmOjY+vpEdO3b2dJur1cnl7U1ss1e/l7b3txYl\n/Ptcizb31+velnqjWMnZJq8ELgG2AE9HxFuAjwLXRMQM8APgrc2UKklaiZUcsLwDOHKRRZ9vvhxJ\n0kp4haUkFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uS\nCmR4S1KBOprPuxdmZ2eZnNze0XOnp0c7mk968+aDGB7u7U0jJKkTAxvek5PbOW/bTawb3dST7T01\n8zCXnHs8W7Yc3JPtSdJaDGx4A6wb3cTe++3f7zIkaeA45i1JBTK8JalAhrckFcjwlqQCGd6SVCDD\nW5IKNNCnCkpS09pyAaDhLWmP0pYLAA1vSXucNlwA6Ji3JBXI8JakAhneklQgw1uSCmR4S1KBDG9J\nKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUoBXNbRIRLweuB7Zl5uURsRm4mir8vw+clplPd69MSdJ8\ny+55R8QG4FLg1nkPXwR8NDOPAO4GzuhOeZKkxaxk2OQJ4DiqPexdtgI31t/fCLy+2bIkSUtZNrwz\n89nMfHLBw6PzhkkeBA5svDJJ0m41MZ/3UAOvIWmAtOVuM23WaXjvjIj19R75S4D7l1p5bGwDIyOr\n+6NMT492WFrnxsZGGR/f2JNt2V/zetnfWg16nXfffXfP7zbzyfedwiGHHNL1bbXl32an4X0rcBLw\nJ/V//3SplaemHl/1Bjp5516rqakZduzY2bNt9Zr9DYbx8Y0DX+fU1EzP7zbTq79fSf82lwr8ZcM7\nIl4JXAJsAZ6OiLcAvwxcFRFvA+4Frlp1VZKkji0b3pl5B3DkIouOab4cSdJKeIWlJBXI8JakAhne\nklQgw1uSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1J\nBTK8JalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQg\nw1uSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgo00smTIuII4FrgLmAI+FZm\nntNkYZKk3esovGv/LTNPaawSSdKKrWXYZKixKiRJq7KWPe+XRcT1wAuBizLz1oZqkiQto9Pw/jZw\nYWZeGxE/CdwWEYdk5jOLrTw2toGRkeFVbWB6erTD0jo3NjbK+PjGnmzL/prXy/7WatDrbPPfry29\ndRTemXk/1QFLMvO7EfEPwEuAexdbf2rq8VVvY2pqppPS1mRqaoYdO3b2bFu9Zn+DYXx848DX2ea/\nX0m9LRX4HY15R8QvRcR59fcHAC8C7uvktSRJq9fpsMkNwJ9ExInAXsCv727IRJLUvE6HTX4AnNBw\nLZKkFfIKS0kqkOEtSQUyvCWpQGu5SEfaY83OzjI5ub2j505Pj3Z0utrmzQcxPLy66yXUXoa31IHJ\nye2ct+0m1o1u6sn2npp5mEvOPZ4tWw7uyfY0+AxvqUPrRjex937797sM7aEc85akAhneklQgw1uS\nCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalA\nhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4\nS1KBDG9JKtBIp0+MiG3Aa4BngXdk5l81VpUkaUkd7XlHxM8B/zgzDwP+LXBpo1VJkpbU6bDJ0cD1\nAJn5f4EXRMS+jVUlSVpSp+F9ALBj3s8P1Y9Jknqg4zHvBYYaep0f8dTMw9142b5vqx/btL+yt2l/\n5W6vW9sampubW/WTIuJ3gfsz88r657uBV2TmTMP1SZIW0emwyZeBtwBExCuB+wxuSeqdjva8ASLi\n/cARwCzw9sz8P00WJkn
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9480f0f0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Passengers older than 25 that survived grouped by Sex\n",
"\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().plot(kind='bar')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to improve it a bit."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f949e0940>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEOCAYAAABGlJbrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFMBJREFUeJzt3XuclNV9x/HPsgusu4CwuFAFIV7aYzBpXjXWmtRUDAk2\nasQWEpKmxNxMK7EvRQJiYqWEQpQkqBhRg4lFjXnlZmnwEg2IJr3oK2kMIW08jVUui6hcxgUWl732\nj1npintndmbP8Hn/4+w8zzznN56d7x7O8zxnSlpbW5EkpWVQoQuQJPWe4S1JCTK8JSlBhrckJcjw\nlqQEGd6SlKCynuwUQlgGnAOUAjcAFwPvBHa17fKVGOMj/VKhJOlNug3vEMJkYFKM8d0hhCrgGWA9\nsCDG+HA/1ydJ6kBPRt5PAk+3PX4VqCQ7Ai/pr6IkSV0r6c0dliGEzwJ/CjQDxwNDgJeBK2KMe/ql\nQknSm/T4hGUIYRrwSeAK4F7gmhjjFGAjsKh/ypMkdaSnJyzPB64Fzo8x7gM2tNv8I2BlV69vampu\nLSsr7XORknSU6nR6uicnLEcAy4ApMcbatud+AMyLMb4ATAZ+09UxMpkDvSk2OdXVw9m5c1+hy1Af\n2X/pKva+q64e3um2noy8ZwKjge+FEEqAVuBu4LshhDpgP9npFElSnnQb3jHGVcCqDjbdm/tyJEk9\n4R2WkpQgw1uSEmR4S1KCDG9JSlCPrvOWpJ5obm6mpmZr3tqrqjo9b20NNEUf3g888H0effRhBg8e\nTENDA5/97GzOPPOsQpclFaWamq3MXf4QQypH93tbDXW7+dbiSkaMGNPvbQ1ERR3eL720g7Vr/5lv\nfvM+Bg0axPbtNdxww2LDW+pHQypHUz5ibKHLKHpFPee9f/9+GhsbaWhoAGDcuPHceuudbN78Alde\neTlXXTWbL3xhHnV1+9m48RmuuWYOABs3/op5864sZOmS1KWiDu9TT/19TjttEh/60MUsXbqIxx9f\nR3NzMzfd9BXmz/8iN9+8kj/+4z/hgQe+zzve8Ucce+xIfv7zp1m1aiVXX31NocuXpE4V9bQJwHXX\nLWLr1s08/fRTfOc797BmzQ949tnfcuON/0hraytNTY2cdlr2pMfs2Vdy2WWXctFFF3P88ScUuHJJ\n6lzRh3dDQwMTJryFCRPewowZM/noR6dTX/8aK1bc8aZ96+r2M3ToUHbu3FmASiWp54p62uTBB9ew\nbNmSQz/v27eX1tYWzjzzLJ566t8BWL/+MX75y18AcMstX2XRoqXs2vUK//VfXS6UKEkFVdQj7wsu\nuJgtW7Zw2WWXUlFRQXNzM3PmzOeEE07gxhuX8O1vr2bo0KEsXLiEDRvWMWbMWE455VRmz76SxYuv\n584772bQoKL++yYpUb36GrS+2rlzX/83UkDFvqZwsbP/cmfLlhe49s6n8nKpYP3el7ljwZSivs67\nunp4p1/G4LBSkhJkeEtSggxvSUqQ4S1JCTK8JSlBhrckJWhAXOfdH2sAjx8/gdLS0i73aWpqYvbs\nz/CWt5zEF76wMCftvvTSDq677hruuuuenBxPkjoyIMI712sAN9Tt5mtXX8jEiSd1ud+uXbtoamrM\nWXC/rqTTKzMlKTcGRHhDYdYA/vrXl7N9ew1Lly7iwIED7N+/r+0uzHmcfPKpzJx5CR/84CU88cTj\njBs3nhDeyoYN6zjxxAlcf/1innvudyxffiMVFeU0NbWwePGNbzj+xo3P8I1vrKSsbDBjx45l/vwv\nUlY2YP6XS0rYUT3nfcUVczjxxImMGzees89+NzffvJK5cxdw6603AdDS0sJpp03irrvuYdOmjYwb\nN45Vq1azceMz1NXtJ5PZw5w581m9ejVvf/s7eOyxR95w/Ftu+So33LCcW25ZyciRo9iwYV0h3qak\nIuQwENi0aSO1ta/y6KMPAxz68gaAt751EgBVVaM59dQ/aHtcxf79+6mqGs3tt6/g9tub2LHjJaZO\n/cCh12Uye9i2bRtf/OI8Wltbqa+vZ+TIUXl8V5KKmeENDB48hKuums/pp7/tTdtKS8s6fNzamh1Z\nz5r1CS644P2sWHE79fWvHdpeVjaY6urqDpeelaQjdVRPm7xu0qS38dOfbgDghRee53vfu7/L/bNr\nebVSW1vLCSeMp6Ghgaee+jcaGxsP7TN8+HBKSkrYvPkFAH74w+/y/PPP9ddbkHSUGTAj74a63QU5\nVkkJzJgxkyVLFvK5z11GS0sLV1017/Wtb9jvjY9LmD79wyxYcDUnn3wSM2bM5KabvsKUKVMP7XfN\nNdexdOkihgwZwujRxzFt2vQje2OS1GZALAlbqOu8c8UlRdNm/+WOS8LmVldLwg6IkXdpaWm312RL\nkv6fc96SlCDDW5ISZHhLUoIMb0lKkOEtSQkaEFebpHSp4NKlizjvvCm8613n5PzYktRTPQrvEMIy\n4BygFLgB+DlwL9mR+w5gVoyxsfMjdK2mZivXPrSI8qqKvh7iDer3HODLFy708kNJRavb8A4hTAYm\nxRjfHUKoAp4B1gNfjzH+MISwBPgUcOeRFFJeVcExY4YdySF67ZFHHuSZZ/6T2tpX2bz5BS677HLW\nrXuUzZs3c/31X2L9+p/w7LP/TUPDQaZNm85FF0079NqWlhaWLVvCjh0vUlLSysc//hnOOOPMvNYv\n6ejVk5H3k8DTbY9fBSqBc4G/aXtuLTCXIwzvQtm+vYbbblvF2rVruO++1dx997d56KEf8fDDaznp\npJP5u7+bw8GDB5k585I3hPdPfvJjjjuumgUL/p6ysiY+9rFZrF79nQK+E0lHk27DO8bYCry+XN6n\ngYeA89tNk7wCHN8/5fW/0057KwCjRx/HKaecSklJCVVVo2loaKC2tpbLL/8UZWWDqa199Q2v27Tp\n12za9Ct+/etfUVY2iMbGBpqamvyyBUl50eOkCSFMIzs9MhVovzxe0l/61f6kZvvHL720gxdf3M5t\nt93FoEGDmDr13De8bvDgwXz8459iypSpro0hKe96esLyfOBasiPufSGEfSGEoTHGg8A44MWuXj9q\nVAVlZZ1f+bF3b2UvSu6ZUaMqqa4e3uU+w4eXc8wxQ6iuHs6xxx5DefngQ49j/C1Tpkxh7NhjWb9+\nPa2tLYwcWU55+WCOPbaCs88+k8cff5yPfGQ6u3fv5r777mHOnDk5fx/Kj+5+V9Qz/fFZ7s7R2nc9\nOWE5AlgGTIkx1rY9vQ6YDtzf9t8fd3WMTOZAl21kMnXU7+l6n96o33OATKau29Hwvn31vPZaAzt3\n7qO29jXq6xsPPT7zzLN47rnn+chH/or3vGcy73rXOVx77XU0NzdTW3uAs846hyee+BkzZnyI0tJB\nzJr1aUffifJfTrmTydTlvc1i7ruu/jB1uyRsCOEyYCHwP2SnSFqBS4FvAkOBLcAnY4zNnR3DJWE1\nkNl/ueOSsLl1REvCxhhXAas62DS1g+f6xCVhJal3vD1ekhJkeEtSggxvSUqQ4S1JCTK8JSlBhrck\nJcjwlqQEGd6SlCDDW5ISZHhLUoIMb0lKkOEtSQkyvCUpQYa3JCXI8JakBBnekpQgw1uSEmR4S1KC\nDG9JSpDhLUkJMrwlKUGGtyQlyPCWpAQZ3pKUIMNbkhJkeEtSggxvSUqQ4S1JCTK8JSlBhrckJais\n0AVI7TU3N1NTszWvbVZVnZ7X9qRcMLw1oNTUbGXu8ocYUjk6L+011O3mW4srGTFiTF7ak3LF8NaA\nM6RyNOUjxha6DGlAc85bkhJkeEtSgno0bRJCeBuwBlgeY1wZQrgbeCewq22Xr8QYH+mnGiVJh+k2\nvEMIFcAKYN1hmxbEGB/ul6okSV3qybRJPfABYEc/1yJJ6qFuR94xxhbgYAjh8E1XhBDmAi8DV8QY\n9/RDfZKkDvT1hOU9ZKdNpgAbgUW5K0mS1J0+XecdY9zQ7scfASu72n/UqArKykr70lQyqquHF7qE\norB3b2VB2rX/cqMQ/Xe
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f947c5400>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We pass 'Sex' from columns to rows with unstack, so that now Pclass is in the columns\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9496c828>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEOCAYAAABGlJbrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF9lJREFUeJzt3WuUVeWd5/FvXRCoAtsCC6IgxEuvh5D0ZLUxJjExYDCa\neMMZSJiMQ0x7SVqCywvxlk4kaEsMKioq0aBj0JhZmrTtiHdR1EnSZpnWoGbisxIjYCEKyIGCgqKu\n86IKFmhdDsW51HP4ft54ztn77P0/PoffeerZez+7rL29HUlSWsqLXYAkac8Z3pKUIMNbkhJkeEtS\nggxvSUqQ4S1JCarMZqUQwiDgdeAq4FngXjqCfw0wPcbYnLcKJUkfkm3P+4fA+52PrwJuiTFOAN4E\nzspHYZKk7vUa3iGEAIwDHgXKgAnAks7FS4Dj81adJKlL2fS8bwAupiO4Aap3GSZZCxyUj8IkSd3r\nMbxDCNOB38UYV3azSlk3r0uS8qi3A5YnA4eGEE4FRgFNwJYQwsAY4/bO197pbSctLa3tlZUVe12s\nJO1juu0g9xjeMcb/vuNxCOFKYAVwDDAVuA+YAjzR294zma1Z1pmm2tqhrFu3udhlqI9sv3SVetvV\n1g7tdtmenOe94xdgNnBmCOF5oAZY3PfSJEl9kdV53gAxxjm7PD0hD7VIkrLkFZaSlCDDW5ISZHhL\nUoIMb0lKkOEtSQnK+myTVD344K948snHGDBgAE1NTXz72zM46qiji12WJO2Vkg7vd99dw5Il/85d\nd/2C8vJyVq+u49prrza8JSWvpIdNtmzZQnNzM01NTQCMGjWaW265gxUr3uKCC87jwgtn8P3vX0JD\nwxaWL3+Fyy67CIDly//IJZdcUMzSJalHJR3eRxzx94wbN56vfe005s6dw7PPLqW1tZUbb7yOSy/9\nF266aSGf/vRnePDBX/HJT/4jf/d3B/DSS79n0aKFXHzxZcUuX5K6Vdbe3p73naxbtzn/O+nBqlUr\n+P3vX+Sppx5j8OAq3njjz4wb9zHa29tpaWlm3LiPc8EFs9i4cSPnnnsmp5xyGmeeeXbW2y/1+RVK\nne2XrlJvu9raoX2bmKoUNDU1MWbMRxkz5qNMnTqNb3xjCo2N21iw4PYPrdvQsIWBAweybt26IlQq\nSdkr6WGTRx55iHnzrtn5fPPmetrb2zjqqKN58cXfAfDMM0/x8st/AODmm69nzpy5rF+/lj/96fWi\n1CxJ2SjpnvdJJ53GypUrOffcM6mqqqK1tZWLLrqUgw8+mJ/85Bruu28xAwcOZPbsa1i2bCkjRozk\n8MOPYMaMC7j66iu54467KS8v6d83SYnaJ8a8863Ux91Kne2XrlJvu57GvO1WSlKCDG9JSpDhLUkJ\nKukDlpIKq7W1lbq6VQXbX319NZlMQ8H2N3r0GCoq+sfN1A1vSTlTV7eKKx6dw6BhVcUuJecaN2zl\nxyfPZuzYQ4tdCpBFeIcQBgM/B0YCA4F/pePu8Z8C1neudl2M8fE81SgpIYOGVTF4xJBil1Hysul5\nnwq8FGO8PoQwBnga+C1weYzxsVwUkY8/tbL586alpYUZM87hox89lO9/f3ZO9vvuu2v4wQ8u4847\n78nJ9iSpK72Gd4zxgV2ejgHe7nzc7fmHe6qubhWz5j/KftXDc7K9pob3ueHik3v982b9+vW0tDTn\nLLh3KMvZ/xlJ6lrWY94hhN8Co4BTgFnAd0MIFwPvATNjjBv2ppD9qoczaP+Re7OJPXbrrfNZvbqO\nuXPnsHXrVrZs2dx5FeYlHHbYEUybdjqnnno6zz33LKNGjSaEj7Fs2VIOOWQMV155NX/961+YP/8n\nVFUNoqWljauv/slu21++/BV+9rOFVFYOYOTIkVx66b9QWelhBkl7L+tTBWOMnwdOA+4D7qFj2GQS\nsByYk5/y8mvmzIs45JCxjBo1ms9+9hhuumkhs2Zdzi233AhAW1sb48aN58477+G115YzatQoFi1a\nzPLlr9DQsIVMZgMXXXQpixcv5h/+4ZM89dTuw/4333w91147n5tvXsgBB9SwbNnSYnxMSSUomwOW\nRwJrY4x1McZXQwiVwGsxxh0HKx8GFva0jZqaKiorux9/rq+v3oOSs1NTU01t7dAe12lqqqeyspwY\n/0Qmk+HZZ58EoLm5idraoZSXl3HssZ9hyJAhjBhRy2c+cyS1tUMZMaKWgQPh8MMP4frrr+enP21k\n7dq1nHrqqQwbVk1lZQXl5U3U1b3Nj350Oe3t7TQ2NjJ69Ed6rUnFYbvkRj7+Lfcn2eRKoWTzN/wX\ngbHARSGEkcAQ4I4QwvdijG8BE4Eep+DLZLb2uIN8nKeZyTT0OufBhg0NtLS00d5ezsyZs/j4xz+x\nc9m6dZtpa2snk9nGtm3ttLS0sXFjI4MGbaalpZX167cwd+5VTJ/+LU466cssWPBTtm7d1rnNVjZt\n2s6BB9Zyww237bbPUp6HIVWlPj9GIRXynOtiyCZXcqmnH4pshk1uB0aEEF4AlgAzgAXA/SGEZcBJ\nJDpsssP48Z/ghReWAfDWW3/jgQd+2eP6HXN5tbNp0yYOPng0TU1NvPjib2lubt65ztChQykrK2PF\nircA+Ld/u5+//e2v+foIkvYx2Zxt0gic0cWinN7Ft6nh/aJsq6wMpk6dxjXXzOa73z2XtrY2Lrzw\nkh1Ld1tv98dlTJnydS6//GIOO+xQpk6dxo03XsekSSfsXO+yy37A3Llz2G+//Rg+/EAmT56ydx9M\nkjr1iylhi3Wed674Z3fabL/cWbnyLeb8x3UleZHOtrVbmP25Swp6hWW/vw1aRUVFv7nkVJJS4KyC\nkpQgw1uSEmR4S1KCDG9JSpDhLUkJ6hdnm6R0quDcuXM47rhJfO5zX8j5tiUpW/0ivHN9943+dscL\nScq1fhHeUJy7bzz++CO88sp/smnTRlaseItzzz2PpUufZMWKFVx55VU888zTvPHG/6OpaTuTJ0/h\nlFMm73xvW1sb8+Zdw5o171BW1s43v3kORx55VEHrl7Tv6jfhXSyrV9dx222LWLLkIX7xi8Xcffd9\nPProwzz22BIOPfQwzj//IrZv3860aafvFt5PP/0EBx5Yy+WX/5DKyhbOOGM6ixf/7yJ+Ekn7kn0+\nvMeN+xgAw4cfyOGHH0FZWRnDhg2nqamJTZs2cd55Z1FZOYBNmzbu9r7XXnuV1177I6+++kcqK8tp\nbm6ipaXFmy1IKoh9Pml2Pai56+N3313DO++s5rbb7qS8vJwTTpiw2/sGDBjAN795FpMmneDcGJIK\nzlMFu/HGG39m5MiRlJeX85vfPE9bWystLS07l48f/3FeeOE5AN5//33uuOO2brYkSbnXb3rejRt6\nvmFDobf16U8fzdtvv83553+HY4+dyDHHHMsNN1y7c/mXvvRlXn75D5x33llUVJQzffrZe71PScqW\nU8LmgMMmabP9cscpYXPLKWElqcQ45i1JCTK8JSlBvQ6bhBAGAz8HRgIDgX8FlgP30hH+a4DpMcbm\n7rYhScqtbHrepwIvxRgnAtOA+cBVwK0xxgnAm8BZeatQkvQh2dw9/oFdno4B3gYmAN/pfG0JMAu4\nI+fVSZK6lPXZJiGE3wKj6OiJP73LMMla4KA81CZJ6kbWByxjjJ8HTgPuA3Y997Db8xAlSfmRzQHL\nI4G1Mca6GOOrIYQKYHMIYWCMcTsdvfF3etpGTU0VlZWFuWCmWGprhxa7BO0F2y836uuri11CXtXU\nVPeb70o2wyZfBMYCF4UQRgJDgMeBqXT0wqcAT/S0gUwmd5e+90deoZc22y93MpmGYpeQV5lMQ0G/\nKz39UGQT3rcDd4UQXgAGAecB/wncG0L4NrASWJyDOiVJWcrmbJNG4IwuFp2Q+3IkSdnwCktJSpDh\nLUkJMrwlKUGGtyQlyPCWpAQZ3pKUIMNbkhJkeEtSggxvSUqQ4S1JCTK8JSlBhrckJcjwlqQEGd6S\nlCDDW5ISZHhLUoIMb0l
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94732128>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Now we make that the plot shows both values combined, and change the labels\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar', \\\n",
" \n",
" stacked=True) "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.text.Text at 0x7f2f9463f908>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATcAAAJqCAYAAABO2geGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4XXWd+PF3SIDSBUkxVATpIIwfrTiPIoOjqAXrKAyK\nDgioiIgiLjAuiAtuLOOgg4IgWJRFQBRH3BBlk4KAP3B82ARB+YyAApWt0EBKS0pp8/vjnGAI2Zub\n23z7fj0PDzfn3nvu954075zlnpOWnp4eJKk06zR7AJLUCMZNUpGMm6QiGTdJRTJukopk3CQVqa3Z\nA9DTRcQq4HbgSaAVeAQ4LDMvb+rAJlhE7Ae8KzP/tdlj0eTkmtuapweYm5lzMjOAjwM/ioiNmzyu\nZvBDmBoz19zWPC31fwBk5jURcTvwSuCXEXEAcAjVWt19wL6ZeU9EPBf4LvAcYH3gfzLzC4NNB4iI\nLwLvrKefBxySmT0R8WvgfGB3YEvgqsx8Z/2c9wBfBu4HjgfOyMx1RjC/q4F/B96Xmf/b9w1HxKeB\nA4EVwAWZ+Yl+928CnAX8A7AecFJmfr2+72Dgw/UyexTYPzP/NNj0fvM9A+gEXgq8ALgOeHtmdkfE\ni4CTgU2BbuC9mXl9RMwFjgYWAk9k5r795rkn8MX6+/ME8JHMvCoiNqvnF1TR/lhmXhwRh1D9MntL\n/fxLgPMy82S0WlxzmxzWBZZHRAfwDWBevVZ3B/CF+jEfA67MzG2AlwDPj4hZg02PiH2BtwHbAVvV\n/32oz2u+CZhH9UP/uoh4ZUS0A98EXpeZLwPeSL12NYL5bZuZLx4gbDsA763H9hJgh4jYo9/7/zxw\nR2a+CHg98OWI2CwipgNHAdvV930V2HWw6YMs27cCu2fm5sBGwPsjooUqzmfWy/mDwM8jovfn5WXA\n/P5hq30T2CUz51DFdbd6+lnADfX8/g34Xr08jweeGxH/GhFvAaYbtvFh3NZwEbELMAu4OjMXARtm\n5n313b8Bnl/ffhB4Yx2LJzJzn8x8YIjpbwK+k5mPZeYq4HSqNbVeP87MJzJzGfB/wBbAK4DsswbU\n94dwuPldOMhb/DeqtbVlmbkC2BH4ad8HZOZHgI/Wt/9Ctda4JdUa1SrggIjYJDN/kplfG2L6QH6e\nmY/Ut88DXgW8EOjIzDPr1/wtsKi+D2BZZl45yPweAD4UEVtk5jWZeWhETAV2ogoZmXkn1fdu13pZ\nHQgcS7VGeMAg89UouVm6ZroiIp6k+uXzV6o1gWX1msOXIuLN9X0bAlk/57h62nxg04iYn5lHDDD9\nm5l5JNVayqERcSDVplsrVQh7Pdrn9sr6/nZgcZ/pf+tze7j59X1eX8/uO5/M7AaIiKceEBHbA0dH\nxPOoovUcYJ3MfDIi5gGfA46KiJuAgzLzlsGmD/D6fcfVWb/HjYBpEfHHenoLMAPYmOoAz2DvBao1\ntS8A10fE3VT7TP9cz+Oa+n21ANOAy+r3fGNEdAFP9t901tgZtzXT3D5rZ33tTbWG9OrM7Kz3v70T\noF4DOAY4JiK2Bi6OiN9k5mUDTL8auJdqrWX+KMbVRfVD3uu5fW6PZX4AD1EFDoCImDnAY84Gjs3M\nU+rHLOy9IzNvAvaKiDbg08C3qJbPgNMHmPez+9yeSRWue4FH603Lp6n3uQ2qXrN8b/3Y/YDvU+0r\nXAm8PDMfH2Ceu1Ltb5wSEbtk5kVDvYZGxs3SNVPLINM3Af5ah21jYC9gOkBEfCsiXl8/7i9UBxt6\nBpm+Cvg5sG9EbFA//8B6v9lQrgdeEhHPr/dLva/PfWOZH1QHLnaLiGfVIToPeEO/x3QAN9Tz3Q+Y\nCkyPiG0i4tyIWDczn6Q6ILBqsOmDvP7OEbFhRLRS7X+7KjPvAhb27vuLiGdHxDm9720w9eN+FRG9\nvwB+B/Rk5krgl1T74IiIqRFxer3fcBrV5upBwEeAbw73OhoZ47bmGerjDz8Anh0R/0e1RvA54HkR\n8VWq/V//VW9K3QJcU3827lv9pv82My/PzPOAXwA31Pe9GbhkkDH0AGTm/cBngSuA/wWu6n3AKOdH\nn+f9jmqH/031+K7LzP/p97AvAudFxO+pwvZt4FRgKVWwb42IPwCHUx2dvGWA6R8dZAiXAT8D7gEe\nBs6op78dODgi/lS/30sHWuvq914eAi4Cro2IW4BzqNfiqMI2t57fdcDtmfk34Ajg/Mz8Y2ZeCywA\nvjTU62hkWkZyPbeImEL1D+8o4HKqzYR1+PtHEVY0cpBaM0XEHOA3mTkpP4NXfxTkz5l5dLPHovE3\n0jW3L1D9VoMqcCdm5lyqjyK8d9BnqSgR0RoRf6t38EO1dvPbZo5JGsywcYvq8M4LgQuo9gXNpdr8\noP7/6wd5qgpT7zv6MHBWRNwGvIZqP9Fk5RkQBRvJ0dJjqXZ2vqf+elqfzdAHqT7BrbVEZv6c6uDB\npJeZbnUUbMg1t/po1zX10aOBDHZUT5Kaarg1t12BLesPjW5Gda7cYxGxfmYur6fdO9yLPPnkyp62\nttbVHqwk9TPoCtaQccvMt/fejuqk6L9SnYLyNqqPIuwBXDzcq3d2LhvhONc8HR0zWLRoSbOHsVZx\nmU+8ybrMOzpmDHrfaD7n1lvIw4H9IuJKqlNVzhr70CSpMUZ8+lV9PmKv/p8gl6Q1imcoSCqScZNU\nJOMmqUjGTVKRjJukInmxSmkNt3LlShYuvHtc57n55lvQ2jr0B+vvv/8+9txzN7797TOYM2ebp6Yf\ncMC7ef7zt+Kznz38Gc+56KJfcuedd3DQQYNdYWriGDdpDbdw4d184rgLWG/a+FxZ6omlD3PsIbsy\ne/aWwz52s802Z8GCS56K29/+tpDHHhv6w74ta8hJmcZNmgTWm7YxUzacNeGvO2fONlx77e/o6emh\npaWFBQsuYfvt/4Xu7m5+9auL+clPfkhraytbbvl8PvnJzz7tuT/96Y+49NKLaW1t5TWvmcvee+8z\noWN3n5ukQbW1tTFnzjbccMN1APy//3clr3zlDgAsX97NccedyPz5p3HXXX/lzjvveOp59913L1dc\ncRknn3w6J510CldccRkPPvjAxI59Ql9N0qSz006v59JLL6a9fSYdHbPYYIOpAMyYMYNPf/oQAO66\n6690df39D6b98Y+3snDhPXzkIx+kp6eHZcse57777mOTTSZu7dO4SRrSdtttz3HH/Tcbb/xsdtxx\nHj09PaxYsYLjjjuGs876H9rb2/nUpz7+tOest966vOpVr+bQQw9r0qjdLJU0jLa2Nl760m254ILz\nefWrXwPAsmVLaWtro729nQceuJ/MP7FixRNPPecFL3ghN9xwHcuXd9PT08MJJxzLE088MdhLNGbc\nE/pqksbkiaUPD/+gBs5rp53m8cgjjzB16jQAnvWsjdhuu+15//vfzdZbB/vs825OPPHr7LXXOwCY\nNes57LXXOzjooAOfOqCw3nrrjdt7GIkR/fWr1bVo0ZJJe636yXqdq8nMZf50E/E5t8m6zDs6Zozt\nYpWSmq+1tXVEn0nT07nPTVKRjJukIhk3SUUybpKKZNwkFcmjpdIarpmXPHr3u9/OC1/4oqdOnP/H\nf3wB//Efh4zbOPbcczfOPvtcpkyZMm7z7GXcpDXcwoV3c9gFRzJl5tRxmV/34mV8edfDR/Txktmz\nZ/ONb3xrXF53YI27PpJxkyaBKTOnssEm05s9DABOOWU+N9/8e1atWsUee+zFvHlv4Oijj2SjjdrJ\nvI1HHulkn33248ILz6er61FOOukUAI444vMsX95Nd3c3H//4J3nhC+cA1ef7H3roIb7ylaN48skn\nWWedVj7zmc+v9kn27nOTNKj+JzDddNPveeCB+zjppFM44YSTOfPM0546Z7StrY0TTpjPVlttza23\n3szxx89nyy234oYbruPhhx9mt93eygknnMwHPnAQ3/te799yr9bcTjvtZN7xjn05/vj57Lnn2znj\njNNWe+yuuUka1N133/X
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f946a22b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Small touches\n",
"\n",
"pclass_labels = ['First', 'Second', 'Third']\n",
"sex_labels = {'Female': 0, 'Male': 1}\n",
"\n",
"plt = df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar', \n",
" stacked=True, rot=0, subplots=False, figsize=(5,10))\n",
"plt.set_xticklabels(pclass_labels)\n",
"plt.legend(labels=sex_labels)\n",
"plt.set_xlabel('Passenger class')\n",
"plt.set_title('Passenger class per sex')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.text.Text at 0x7f2f945c0b00>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAEMCAYAAAD9OXA9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2cVHXd//HXsIjAArqLC6YIItZH0W4004AS0QwNwxRR\nk7zNu8LL0jQzQ7y71DRRFCFJBS9vKr3qRxalAubNQ80LJG8Q/SSrgSQg4gLCstws8/vjnMVh3Zuz\n6373zM6+n4+HD2bOzJx5z+y67/me851zMtlsFhERkVA6pB1AREQKm4pGRESCUtGIiEhQKhoREQlK\nRSMiIkGpaEREJKiOaQcQqYuZbQUWAVuAImA1cLm7P5lqsFZmZqcD33P3I9POItJcGtFIvsoCQ919\noLsbcBHwiJn1TDlXGvRlN2nTNKKRfJWJ/wPA3Z83s0XAIOAvZnY2cDHRaGcZcKq7v2tmuwH/A+wK\n7Aj8zt3H1bccwMyuBE6Jl88ALnb3rJn9HXgUOB7oDzzj7qfEjzkDuAFYDtwGTHP3DgnW9xxwHPB9\nd/9H7gs2s8uAc4HNwEx3/0mt23sB9wF7Ap2ASe5+a3zbBcAP4/dsDXCmu79R3/Ja650GVABfAj4H\nzANOdvcqM9sXmAJ8BqgCznL3l8xsKHA9sBTY5O6n1lrnaODK+OezCbjQ3Z8xs93j9RlRgf7Y3R8z\ns4uJPlgcGz/+cWCGu09B2jyNaKQt2QHYaGZlwO3AEfFopxwYF9/nx8DT7r4/8HlgLzPrXd9yMzsV\nOAE4CBgQ//eDnOc8BjiC6A/w4WY2yMxKgDuBw939AGA48agjwfoOdPf96iiZIcBZcbbPA0PMbFSt\n1/8LoNzd9wW+AdxgZrubWTfgGuCg+LabgRH1La/nvf0OcLy79wF2Bs4xswxRUU6P3+fzgT+ZWc3f\njQOAybVLJnYncLS7DyQqupHx8vuA+fH6vgU8EL+ftwG7mdmRZnYs0E0lUzhUNNImmNnRQG/gOXdf\nCfRw92Xxzc8Ce8WX3weGx3+4N7n7GHdf0cDyY4B73X2du28F7iEawdT4X3ff5O6VwL+AvsAhgOeM\nDHL/IDa2vr/W8xK/RTSKqXT3zcBhwB9z7+DuFwI/ii+/QzSa6k800tgKnG1mvdz9D+7+qwaW1+VP\n7r46vjwDGAzsA5S5+/T4OV8AVsa3AVS6+9P1rG8F8AMz6+vuz7v7JWbWFRhGVCq4+9tEP7sR8Xt1\nLnAL0Ujp7HrWK22QNp1JPnvKzLYQfSD6N9En5Mr4E/V1Zvbt+LYegMePmRAvmwx8xswmu/tVdSy/\n092vJvr0fomZnUu0eamIqJRqrMm5XB3fXgJ8mLP8PzmXG1tf7uNy7ZK7HnevAjCzbXcws4OB681s\nD6IC2RXo4O5bzOwI4ArgGjN7BRjr7gvqW17H8+fmqohf485AsZktjJdngO5AT6LJGfW9FohGMOOA\nl8xsCdE+trfidTwfv64MUAzMiV/zP81sLbCl9uY9adtUNJLPhuaMWnKdRDRy+Jq7V8T7a04BiD8Z\n3wTcZGZ7A4+Z2bPuPqeO5c8B7xF9mp/chFxrif7g1tgt53Jz1gfwAVHZAGBmpXXc537gFnefGt9n\nac0N7v4KcKKZdQQuA35N9P7UubyOde+Sc7mUqETeA9bEm7+2E++jqVc84jorvu/pwINE+5aqgS+7\n+4Y61jmCaP9UZzM72t3/1tBzSNuhTWeSzzL1LO8F/DsumZ7AiUA3ADP7tZl9I77fO0QTBbL1LN8K\n/Ak41cy6xI8/N97P0pCXgM+b2V7xfozv59zWnPVBNOlgpJntFJfCDOCbte5TBsyP13s60BXoZmb7\nm9nDZraDu28h2pm/tb7l9Tz/UWbWw8yKiPbXPOPui4GlNfuKzGwXM3uo5rXVJ77fE2ZWU8YvAll3\nrwb+QrTPBjPramb3xPuZiok2qY0FLgTubOx5pO1Q0Ui+amhK72+BXczsX0SflK8A9jCzm4n2l/x3\nvLlnAfB8/N2bX9da/oK7P+nuM4A/A/Pj274NPF5PhiyAuy8Hfg48BfwDeKbmDk1cHzmPe5FoZ/0r\ncb557v67Wne7EphhZi8TlcxdwG+A9UTl+bqZvQaMJ5rltaCO5T+qJ8Ic4P8B7wKrgGnx8pOBC8zs\njfj1zqprNFLrtXwA/A2Ya2YLgIeIRzdEJTM0Xt88YJG7/we4CnjU3Re6+1xgNnBdQ88jbUdG56MR\n+XTMbCDwrLu3ye/4xNOb33L369POIoVJ+2hEmijevLQEOM7d/4/oU/8L6aYSyV/adCbSRPG+hh8C\n95nZm8DXifYrtFXarCFBadOZiIgEpRGNiIgE1a730WzZUp2tqKhMO8Z2Skq6okzJ5GMuZUpGmZLL\nx1xlZd3r++pBndr1iKZjx6K0I3yCMiWXj7mUKRllSi5fczVFuy4aEREJT0UjIiJBqWhERCQoFY2I\niASlohERkaBUNCIiEpSKRkREgmrXX9gUEalLdXU1S5cuadF19unTl6Kihr8Ts3z5MkaPHsldd01j\n4MD9ty0/++zT2GuvAfz85+M/8Zi//e0vvP12OWPH1ncGiPSpaEREalm6dAk/mTCTTsUtc+aHTetX\nccvFI+jXr3+j99199z7Mnv34tqJZsmQJ69Z91OBjMk36nn7rU9GIiNShU3FPOvfo3erPO3Dg/syd\n+yLZbJZMJsPMmTM5+OCvUlVVxRNPPMYf/vB7ioqK6N9/Ly699OfbPfaPf3yEWbMeo6ioiK9/fSgn\nnTSm1fPXRftoRETySMeOHRk4cH/mz58HwJw5cxg0aAgAGzdWMWHCHUyefDeLF/+bt98u3/a4Zcve\n46mn5jBlyj1MmjSVp56aw/vvr0jlNdSmEY2ISJ4ZNuwbzJr1GCUlpey666506dIVgO7du3PZZRcD\nsHjxv1m7ds22xyxc+DpLl77LhReeTzabpbJyA8uWLaNXr9YfldWmohERyTMHHXQwEyb8kp49d2H4\n8OFks1k2b97MhAk3cd99v6OkpISf/vSi7R7TqdMODB78NS655PKUUtdPm85ERPJMx44d+dKXDmTm\nzEcZNmwYAJWV6+nYsSMlJSWsWLEc9zfYvHnTtsd87nP7MH/+PDZurCKbzTJx4i1s2rSpvqdoVRrR\niIjUYdP6Vamua9iwI1i9ejXdunUDYKedduaggw7mnHNOY++9jTFjTuOOO27lxBO/C0Dv3rty4onf\nZezYc7dNBujUqVOLvYZPo12fyrm8vDxbUbE+7RjbKSkppjmZqqurgQxFRS0/SG1uptAaypXkOwsh\nlJV1Z+XKhqeitjZlSiY3U1rfo2ksV75o6onP2vWI5qxxD7fYPPm0rVu5iG4HlNO5tGvaUVJX9WEl\nN4wYn+g7CyJ1KSoq0u9PC2rXRZPWPPkQNq5bRefSrnTp1S3tKCIi29FkABERCUpFIyIiQaloREQk\nKBWNiIgE1a4nA4iI1CXN0wScdtrJ7LPPvtsOqvmFL+zP2Wdf0GI5Ro8eyf33P0znzp1bbJ2NUdGI\niNSydOkSLp95dYt9XaApU+779evH7bf/etv1lv8eTeufU0BFIyJSh3z6usDUqZN59dWX2bp1K6NG\nncgRR3yT66+/mp13LsH9TVavrmDMmNP5618fZe3aNUyaNBWAq676BRs3VlFVVcVFF13KPvsMBKIv\n6X/wwQfceOM1bNmyhQ4divjZz34R7ACc2kcjIpJHah+sZd68eaxYsYxJk6YyceIUpk+/e9sxzDp2\n7MjEiZMZMGBvXn/9VW67bTL9+w9g/vx5rFq1ipEjv8PEiVM477yxPPDAffEaoxHN3XdP4bvfPZXb\nbpvM6NEnM23a3cFek0Y0IiJ5ZMmSxdsO9Z/JZBgyZBCvv/76tmUAq1Z9AMDAgfsB0LPnLvTrtycA\npaU9WbduHaWlpUyffjcPPXQ/mzdv2naqgRoLFrzKu+8uYfr0u8lms+y8c0mw16SiERHJI7X30cyc\n+QeOOWYk3/veGZ+4b+7
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f946038d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#The same horizontal\n",
"pclass_labels = ['First', 'Second', 'Third']\n",
"sex_labels = {'Female': 0, 'Male': 1}\n",
"\n",
"plt = df.query('Age > 25 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='barh', \n",
" stacked=True, rot=0, subplots=False)\n",
"plt.set_yticklabels(pclass_labels)\n",
"plt.legend(labels=sex_labels)\n",
"\n",
"plt.set_ylabel('Passenger class')\n",
"plt.set_title('Passenger class per sex')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Sex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to explore the Sex attribute"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 314\n",
"male 577\n",
"dtype: int64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passengers by sex\n",
"df.groupby('Sex').size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see men are more numerous than women."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f94585780>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEtVJREFUeJzt3X2QXXV9x/H3NguBbBKzwIIxVMSqX2UsnTo4OIkQCBIf\nWvEhUKshpcQHrNKJqHWwikaQltqBQVDHNhCkKU7VNqNGRhqsD4hQA1of6Ey/KiiYB82qV/Nkwmaz\n/eP8Anfj7ubm4dy77L5fM5k995zzO/e7M2fzub/fOed3u4aGhpAk6fc6XYAkaXwwECRJgIEgSSoM\nBEkSYCBIkgoDQZIEQHfdbxARi4G/AQaA9wHfB1ZRhdEmYElmDpT9lgGDwIrMXFl3bZKkx3XV+RxC\nRBwD3Av8MTADuBI4AvhCZq6OiKuBR6gC4tvAacBu4D7gjMz8dW3FSZKGqbuH8CLgzszcAewALomI\nh4BLyvY1wDuBHwDrMnMbQETcDcwDbq+5PklSUXcgPA3oiYjPAbOADwDTMnOgbN8MzAZOAPqb2vWX\n9ZKkNqk7ELqAY4BXUYXDV8q65u2jtZMktVHdgfBz4J7M3AM8FBFbgYGImJqZu4A5wAZgI8N7BHOo\nrj2MavfuwaHu7ik1lS1JE9aoH7jrDoS1wC0R8SGqnsJ04A7gfOA2YFF5vQ64KSJmAnuAuVR3HI2q\n0dhRY9mSNDH19c0YdVutdxkBRMQbgTcAQ8BVwP1UdxVNBR4GLs7MwYh4NfAuqkC4ITP/bazj9vdv\ndZpWSTpAfX0zRu0h1B4IdTEQJOnAjRUIPqksSQIMBElSYSBIkoA2zGU0Xg0ODrJ+/SOdLkPj0Ikn\nPpUpU7ylWZPPpA2E9esf4b3X/ztHTT+m06VoHNm57Vd88G3nc9JJJ3e6FKntJm0gABw1/Rimzezr\ndBmSNC54DUGSBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJAkCQBBoIk\nqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCBIEkquus8eETMBz4DPAB0\nAd8D/hFYRRVGm4AlmTkQEYuBZcAgsCIzV9ZZmyRpuHb0EL6amQsy8+zMXAZcCdyYmfOBB4GlETEN\nuAJYAJwNXBYRs9pQmySpaEcgdO3z+ixgTVleA5wLnA6sy8xtmbkTuBuY14baJElFrUNGxSkR8Vng\nGKrewbTMHCjbNgOzgROA/qY2/WW9JKlN6u4h/BBYnpmvBP4SuJnhIbRv72F/6yVJNam1h5CZG6ku\nKpOZD0XEz4DTImJqZu4C5gAbgI0M7xHMAe4d69i9vdPo7p5y0LVt2dJz0G01sfX29tDXN6PTZUht\nV/ddRq8DZmfmtRHxZKqhoVuA84HbgEXAHcA64KaImAnsAeZS3XE0qkZjxyHV1mhsP6T2mrgaje30\n92/tdBlSLcb6sFP3NYTPA5+MiFcARwCXAN8F/iUi3gQ8DNyamYMRcTmwlioQlmemf5GS1EZ1Dxlt\nA84bYdPCEfZdDayusx5J0uh8UlmSBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkSYCBIkgoDQZIE\nGAiSpMJAkCQBBoIkqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCBIEkq\nDARJEmAgSJIKA0GSBBgIkqTCQJAkAQaCJKkwECRJAHTX/QYRcRTwAHAl8GVgFVUQbQKWZOZARCwG\nlgGDwIrMXFl3XZKk4drRQ7gC+GVZvhK4MTPnAw8CSyNiWtlnAXA2cFlEzGpDXZKkJrUGQkQE8Gzg\ndqALmA+sKZvXAOcCpwPrMnNbZu4E7gbm1VmXJOl31d1DuBZ4O1UYAPRk5kBZ3gzMBk4A+pva9Jf1\nkqQ2qu0aQkQsAe7JzIerjsLv6Bpp5Rjrh+ntnUZ395SDLY8tW3oOuq0mtt7eHvr6ZnS6DKnt6ryo\n/CfAyRHxcmAO8CiwLSKmZuausm4DsJHhPYI5wL37O3ijseOQims0th9Se01cjcZ2+vu3droMqRZj\nfdipLRAy88/3LkfE+4CfAHOB84HbgEXAHcA64KaImAnsKfssq6suSdLI2vUcwt5hoPcDF0XE14Be\n4NZyIflyYG35tzwz/XgmSW1W+3MIAJn5gaaXC0fYvhpY3Y5aJEkj80llSRJgIEiSCgNBkgQYCJKk\nwkCQJAEGgiSpMBAkSYCBIEkqDARJEmAgSJIKA0GSBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkS\nYCBIkgoDQZIEGAiSpMJAkCQBLQZCRHxihHX/edirkSR1TPdYGyNiMfBm4LkRcVfTpiOBE+osTJLU\nXmMGQmbeFhFfBW4D3t+0aQ/wvzXWJUlqszEDASAzNwBnRcSTgGOArrJpFvCrGmuTJLXRfgMBICI+\nDCwF+nk8EIaAp9dUlySpzVoKBGAB0JeZO+ssRhIMDg6yfv0jnS5D49CJJz6VKVOm1Hb8VgPhh4aB\n1B7r1z/C8tVXc3RvT6dL0Tjy28Z2lr/6PZx00sm1vUergbC+3GV0N7B778rMfN9YjSLiaOATVHck\nTQU+CHwXWEV1y+smYElmDpQ7mpYBg8CKzFx5YL+KNHEc3dtDz3EzOl2GJplWH0z7JfBfwC6q/7D3\n/tuflwP3ZeZZwGuA64ArgY9k5nzgQWBpREwDrqAamjobuCwiZh3A7yFJOkSt9hCuOpiDZ+anm14+\nFfgpMB+4pKxbA7wT+AGwLjO3AUTE3cA84PaDeV9J0oFrNRB2U91VtNcQ8Bvg2FYaR8Q3gDlUPYY7\nM3OgbNoMzKYaUupvatJf1kuS2qSlQMjMx4aWIuJI4Bzgj1p9k8ycFxGnUj3g1tW0qWuUJqOtf0xv\n7zS6uw/+avuWLV6w08h6e3vo6+vc+L3npkZT97nZag/hMZn5KPDFiHgncM1Y+0bE84DNmbk+M78X\nEVOArRExNTN3UfUaNgAbGd4jmAPcO9axG40dB1r6Pu23H1J7TVyNxnb6+7d29P2lkRyOc3OsQGn1\nwbSl+6z6far/tPfnTOAkqovEJwDTgS8C51P1FhYBdwDrgJsiYibVtBhzqe44kiS1Sas9hDOaloeA\nLcCftdDu48DN5ZbVo4C/Ar4FrIqINwEPA7dm5mBEXA6spQqE5ZnZuY9okjQJtXoN4WKAiDgGGMrM\nRovtdgKLR9i0cIR9VwOrWzmuJOnwa3XIaC7Vw2QzgK6I+CVwYWbeX2dxkqT2afXBtGuAV2Tm8ZnZ\nB7yW6iEzSdIE0WogDGbmA3tfZOb/0DSFhSTpia/Vi8p7ImIRcGd5/RJam7pCkvQE0WogvBm4EbiJ\n6i6g7wBvrKsoSVL7tTpktBDYlZm9mXks1ZPEL6uvLElSu7UaCBcCr256vRB43eEvR5LUKa0GwpTM\nbL5mMEQL8w1Jkp44Wr2G8PmIuAf4OlWInAP8R21VSZLarqUeQmZ+EHgX1XTVm4C3ZObVdRYmSWqv\nlmc7zcy7qb5CU5I0AbV6DUGSNMEZCJIkwECQJBUGgiQJMBAkSYWBIEkCDARJUmEgSJIAA0GSVBgI\nkiTAQJAkFQaCJAkwECRJhYEgSQIMBElSYSBIkgADQZJUtPyNaQcrIj4EvBCYAlwD3AesogqjTcCS\nzByIiMXAMmAQWJGZK+uuTZL0uFp7CBFxFnBKZs4FXgpcD1wJfCQz5wMPAksjYhpwBbAAOBu4LCJm\n1VmbJGm4uoeMvgZcUJZ/DfQA84HPl3VrgHOB04F1mbktM3dSfXfzvJprkyQ1qXXIKDOHgN+Wl68H\nbgdenJkDZd1mYDZwAtDf1LS/rJcktUnt1xAAIuIVwFJgIfCjpk1dozQZbf1jenun0d095aBr2rKl\n56DbamLr7e2hr29Gx97fc1OjqfvcbMdF5RcD76bqGWyNiK0RMTUzdwFzgA3ARob3COYA94513EZj\nxyHV1WhsP6T2mrgaje3
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f946ae550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot with seaborn\n",
"sns.countplot('Sex', data=df)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9452c2b0>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAESCAYAAAACDEUqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEfNJREFUeJzt3XuQXnV9x/H3moXgbpJmhSXGoIi3rzKWmTo4OIkQiBJv\nFXUCY2tMU9E200InIrYT2yIxltbSgbGijm0gNM3gjJfJqJFpJjBaFI0G63j7o18VFMhFsziPbrJp\nwrJJ/zgnugl7eVie3Wf3l/drZmfP8zuX57u7Zz/nt79z2Y5jx44hSSrDM9pdgCSpdQx1SSqIoS5J\nBTHUJakghrokFcRQl6SCdDazUESsBP4aGAQ+CPwQ2EJ1UNgHrMrMwXq5tcAQsDEzN01K1ZKkEXWM\nd516RDwL2An8ATAX2ACcBnw5M7dGxE3AI1Qh/13gQuAJ4AHg4sz89eSVL0karpme+muBezLzEHAI\nWBMRDwFr6vnbgPcDPwZ2ZeZBgIi4H1gC3N3yqiVJI2om1J8PdEfEF4H5wIeArswcrOfvBxYCC4C+\nYev11e2SpCnSTKh3AM8C3kYV8F+t24bPH209SdIUaibUfwl8MzOPAg9FxAFgMCJmZ+YRYBGwB9jL\niT3zRVRj8aN64omhY52dsyZWuSSdukbtNDcT6juAOyPiZqoe+xxgO3AlcBewon69C7g9IuYBR4HF\nVFfCjKrRONRM8WpSb+9c+voOtLsM6UncN1urt3fuqPPGvU49M/cCnwe+RXXS8xrgRmB1RNwH9ACb\nM/MwsI7qILADWJ+Z/hQlaQqNe0njZOrrO+Bzf1vI3pCmK/fN1urtnTvq8It3lEpSQQx1SSqIoS5J\nBWnq2S+Spp+hoSF2736k3WU0pb+/m0ZjoN1ljOucc57HrFkz+zJrQ12aoXbvfoTrb72b07vPbHcp\nRXh84Ffc8r43ce6557W7lKfFUJdmsNO7z+SMeQvaXYamEcfUJakghrokFcRQl6SCGOqSVBBDXZIK\nYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCG\nuiQVxFCXpIIY6pJUEENdkgpiqEtSQTrHWyAilgKfA34EdAA/AP4F2EJ1UNgHrMrMwYhYCawFhoCN\nmblpsgqXJD1Zsz31/87MZZl5WWauBTYAt2XmUuBB4OqI6AJuAJYBlwHXRcT8SalakjSiZkO946TX\nlwLb6ultwOXARcCuzDyYmYeB+4ElrShSktSccYdfaudHxBeAZ1H10rsyc7Cetx9YCCwA+oat01e3\nS5KmSDM99Z8A6zPzrcCfAndw4sHg5F78eO2SpEkybk89M/dSnSglMx+KiF8AF0bE7Mw8AiwC9gB7\nObFnvgjYOda2e3q66OycNdHaNYLe3rntLkFTpL+/u90lFKenp3vG/w41c/XLO4CFmXlLRDybapjl\nTuBK4C5gBbAd2AXcHhHzgKPAYqorYUbVaBx6etXrBL29c+nrO9DuMjRFGo2BdpdQnEZjYEb8Do11\n4GlmTP1LwKcj4i3AacAa4PvAf0bEnwMPA5szcygi1gE7qEJ9fWZO/++OJBWkmeGXg8AVI8xaPsKy\nW4GtLahLkjQB3lEqSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCG\nuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIKYqhL\nUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklSQzmYWiogzgB8BG4CvAFuoDgj7gFWZORgRK4G1\nwBCwMTM3TU7JkqTRNNtTvwH4VT29AbgtM5cCDwJXR0RXvcwy4DLguoiY3+piJUljGzfUIyKAlwJ3\nAx3AUmBbPXsbcDlwEbArMw9m5mHgfmDJpFQsSRpVMz31W4D3UQU6QHdmDtbT+4GFwAKgb9g6fXW7\nJGkKjTmmHhGrgG9m5sNVh/1JOkZqHKP9BD09XXR2zmpmUTWpt3duu0vQFOnv7253CcXp6eme8b9D\n450ofRNwXkS8GVgEPA4cjIjZmXmkbtsD7OXEnvkiYOd4b95oHJpQ0RpZb+9c+voOtLsMTZFGY6Dd\nJRSn0RiYEb9DYx14xgz1zPyj49MR8UHg58Bi4ErgLmAFsB3YBdweEfOAo/Uya59m3ZKkp+ipXKd+\nfEjlRmB1RNwH9ACb65Oj64Ad9cf6zJz+hztJKkxT16kDZOaHhr1cPsL8rcDWVhQlSZoY7yiVpIIY\n6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEu\nSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJU\nkM52FzDdDQ0NsXv3I+0uoyn9/d00GgPtLmNc55zzPGbNmtXuMqQiGerj2L37EQ5+ZDvP7Tqr3aU0\npafdBYzj0UOPsXvd6zn33PPaXYpUpHFDPSKeCfwHsACYDfwD8H1gC9XwzT5gVWYORsRKYC0wBGzM\nzE2TVPeUem7XWbxwzoJ2l1GMRrsLkArWzJj6m4EHMvNS4O3ArcAG4OOZuRR4ELg6IrqAG4BlwGXA\ndRExf1KqliSNaNyeemZ+dtjL5wGPAkuBNXXbNuD9wI+BXZl5ECAi7geWAHe3smBJ0uiaHlOPiG8A\ni6h67vdk5mA9az+wkGp4pm/YKn11uyRpijQd6pm5JCIuAO4COobN6hhlldHaf6unp4vOzul9FUR/\nf3e7SyhOT083vb1z213GjOe+2Xol7JvNnCh9BbA/M3dn5g8iYhZwICJmZ+YRqt77HmAvJ/bMFwE7\nx9p2o3Fo4pVPkUZjYNpfUTLTNBoD9PUdaHcZM95MuHx1ppkp++ZYB55mTpReAlwPEBELgDnAvcCV\n9fwVwHZgF3BhRMyLiDnAYuDrEy9bkvRUNRPqnwLOjoivUZ0U/QvgRmB1RNxHdWn05sw8DKwDdtQf\n6zNz+h/yJKkgzVz9chhYOcKs5SMsuxXY2oK6JEkT4LNfJKkghrokFcRQl6SCGOqSVBBDXZIKYqhL\nUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQV\nxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVJDOZhaKiJuBVwOz\ngI8ADwBbqA4K+4BVmTkYESuBtcAQsDEzN01K1ZKkEY3bU4+IS4HzM3Mx8Abgo8AG4OOZuRR4ELg6\nIrqAG4BlwGXAdRExf7IKlyQ9WTPDL/cBV9XTvwa6gaXAl+q2bcDlwEXArsw8mJmHgfuBJa0tV5I0\nlnGHXzLzGPB/9ct3A3cDr8vMwbptP7AQWAD0DVu1r26XJE2RpsbUASLiLcDVwHLgp8NmdYyyymjt\nv9XT00Vn56xmS2iL/v7udpdQnJ6ebnp757a7jBnPfbP1Stg3mz1R+jrgA1Q99AMRcSAiZmfmEWAR\nsAfYy4k980XAzrG222gcmljVU6jRGKCn3UUUptEYoK/vQLvLmPEajYF2l1CcmbJvjnXgaeZE6Tzg\nZuAPM/M3dfO9wIp6egWwHdgFXBgR8yJiDrAY+PrTqFuS9BQ101N/O3Am8NmI6ACOAauBOyJiDfAw\nsDkzhyJiHbADOAqsz8zpf8iTpII0c6J0I7BxhFnLR1h2K7C1BXVJkibAO0olqSCGuiQVxFCXpIIY\n6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEu\nSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJU\nkM5mFoqIlwNfAG7NzE9
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f94539908>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Same graph with matplotlib and pandas\n",
"colors_sex = ['#ff69b4', 'b']\n",
"df.groupby('Sex').size().plot(kind='bar', rot=0, color=colors_sex)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 233\n",
"male 109\n",
"Name: Survived, dtype: int64"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passergers survived by sex\n",
"df.groupby('Sex')['Survived'].sum()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 0.742038\n",
"male 0.188908\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passergers survived by sex\n",
"df.groupby('Sex')['Survived'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that 74% of female survived, while only 18% of male survived."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f944d2cc0>"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGAxJREFUeJzt3X2UHXWd5/F3Jx1I0gmmgXZCxCAY+MIw4pyobAwSF6Ms\n4zriOOCoMIOwuHt2woyZYXZGo+6iDqjjMWJ8GIUxuCrrIwo4AhtWUZCnNRlxmAe+PgAGkox24GJC\n85Cn3j9uRW7fTacrD3Vvp+v9OqdPqupXVffb53Tup+pXVb/qGR4eRpJUT5O6XYAkqXsMAUmqMUNA\nkmrMEJCkGjMEJKnGDAFJqrHeqj8gIpYDC4AdwNLMXN3StgQ4B9gGrM7MP6+6HknSMyo9E4iIRcC8\nzFwIXAisaGmbCfwFcEpmLgJOjIiTq6xHkjRS1d1Bi4FrATLzPmBWRMwo2rYATwOHREQvMA14tOJ6\nJEktqg6B2cBgy/zGYhmZ+TTwXuB+4AHg7sz8acX1SJJadPrCcM/OiaI7aBkwDzgaWBARL+hwPZJU\na1VfGF5PceRfmANsKKZPAH6WmQ2AiLgNeBFw72g727Zt+3Bv7+SKSpWkCatntIaqQ2AVcAlwZUTM\nB9Zl5lDR9iBwQkQcXHQNvRj41u521mg8UWGpksaDlSuvYNWqGzj99FdzwQX/udvlTAgDAzNHbeup\nehTRiLgMeDmwHVgCzAcey8zrIuKtwAXAVuCOzHz77vY1OLjZIU+lCeypp57k/PPfzPDwMD09k7jq\nqquZOnVat8s64A0MzOzamQCZuaxt0b0tbVcCV1Zdg6QDw9atW9l5YDo8vIOtW7caAhXziWFJqjFD\nQJJqzBCQpBozBCSpxgwBSaoxQ0CSaswQkKQaMwQkqcYMAUmqMUNAkmrMEJCkGqt87CBJY9u+fTsP\nP7y222V03dDQ0Ij5hx5aS19fX5eqGR+OPHIukydXN4S+ISCNAw8/vJZLvn4p0/rr/YW3Y8v2EfMr\nvv8pJh1U33eIPNkY4pLXv5Ojjjq6ss8wBKRxYlp/H32Hjz7uex1sf3obj7XMTz9sBpMP9muqSl4T\nkKQaMwQkqcYMAUmqMUNAkmrMEJCkGqv8sntELAcWADuApZm5ulg+B7gaGAZ6gGOAv8rML1VdkySp\nqdIQiIhFwLzMXBgRxwMrgYUAmbkeOK1YbzJwC3B9lfVIkkaqujtoMXAtQGbeB8yKiBm7WO8twDWZ\n+UTF9UiSWlQdArOBwZb5jcWydhcCn6m4FklSm05fGO5pXxARC4B/zczHO1yLJNVe1ReG1zPyyH8O\nsKFtndcA/6fMzvr7p9PbW99xRDRxbdpU7zGDNLr+/j4GBqobTqTqEFgFXAJcGRHzgXWZOdS2zkuA\nL5bZWaPhJQNNTI1G+3+LeuqZ1NJZ0NM2X1ONxhCDg5v3aR+7C5FKu4My805gTUTcDlwOLImI8yLi\nzJbVZgO/rLIOSQeGSVMmM+O4QwGYceyhTJrimX/VKn9OIDOXtS26t639hVXXIOnA0X/yHPpPntPt\nMmrDJ4YlqcYMAUmqMUNAkmrMEJCkGjMEJKnGDAFJqjFDQJJqzBCQpBozBCSpxgwBSaoxQ0CSaswQ\nkKQaMwQkqcYMAUmqMUNAkmrMEJCkGjMEJKnGDAFJqjFDQJJqrPJ3DEfEcmABsANYmpmrW9qOBL4I\nTAH+ITP/uOp6JEnPqPRMICIWAfMycyFwIbCibZUPAx/KzAXA9iIUJEkdUnV30GLgWoDMvA+YFREz\nACKiB3gZ8M2i/U8y8+GK65Ektag6BGYDgy3zG4tlAAPA48DlEXFbRFxWcS2SpDaVXxNo09M2/Rzg\nI8Ba4FsR8TuZeeNoG/f3T6e3d3LFJUqdt2lTX7dL0DjV39/HwMDMyvZfdQis55kjf4A5wIZieiPw\nYGY+CBAR3wZOBEYNgUbjiWqqlLqs0RjqdgkapxqNIQYHN+/TPnYXIlV3B60CzgKIiPnAuswcAsjM\n7cD9EfH8Yt0XAVlxPZKkFpWeCWTmnRGxJiJuB7YDSyLiPOCxzLwO+DPgs8VF4nsz85tV1iNJGqny\nawKZuaxt0b0tbT8DTq26BknSrvnEsCTVmCEgSTVmCEhSjRkCklRjhoAk1ZghIEk1ZghIUo0ZApJU\nY4aAJNWYISBJNWYISFKNGQKSVGOGgCTVmCEgSTVmCEhSjZV+n0BE/AZwVDH788z8RTUlSZI6ZcwQ\niIg3AO8AjgAeKhbPjYh1wPsz86sV1idJqtBuQyAiPlus85bM/FFb2wuB/xYR/zEz31JZhZKkyox1\nJvCN4l3A/58iFM6NiDP3f1mSpE4YKwR+uzji36XMfO9oIbFTRCwHFgA7gKWZubql7QFgbdE2DJyT\nmRvKFi9J2jdjhcDO9mOLn1uBycDLgR+OtfOIWATMy8yFEXE8sBJY2LLKMHBGZj65p4VLkvbdbkMg\nM98NEBHXAydn5vZifgrw5RL7XwxcW+zrvoiYFREzMvPxor2n+JEkdUHZ5wTmMvLLephnbhfdndnA\nYMv8xmJZq09FxG0RcVnJWiRJ+0nZ5wS+Bfw4ItbQ7L+fT3GEv4faj/rfDdwEPApcFxGvz8yvj7Zx\nf/90ensn78XHSuPbpk193S5B41R/fx8DAzMr23+pEMjMdxa3i76A5hf5ezLzX0psup6RR/5zgF9f\n+M3ML+ycjogbiv2PGgKNxhNlypUOOI3GULdL0DjVaAwxOLh5n/axuxAp1R0UEQcDp9O8LnANMDMi\nppbYdBVwVrGP+cC6zBwq5g+JiJuK6wvQvNj8T2XqkSTtH2WvCXwSeD5wWjE/H/jsWBtl5p3Amoi4\nHbgcWBIR50XEmZm5iWY3010RcRvwyyJgJEkdUvaawPGZeUpE3AKQmX8bEW8qs2FmLmtbdG9L28eA\nj5WsQZK0n5U9E9hW/DsMEBF9wLRKKpIkdUzZEPhqRHwbOCYiVgD3AFdXV5YkqRPK3h308Yi4G/j3\nwNPAGzNzTZWFSZKqVyoEIuIu4HPAZzLz0WpLkiR1StkLwxcDfwD8MCLuAT4PXJ+ZWyqrTJJUuVLX\nBDLz9sz8U+B5wEeAM4B1FdYlSeqAPXm95CzgdcDZwDHAp6sqSpLUGWWvCfxv4ESa4wVdmpl3VFqV\nJKkjyp4JfBS4KTN3VFmMJKmzxnrH8Ecz8200XzT/9ogY0Z6ZiyqsTZJUsbHOBFYW/76r6kIkSZ03\n1pvFflRMfpDmcwJf8jkBSZo4fE5AkmrM5wQkqcZ8TkCSamxPnxP4Bj4nIEkTRtkzge8Br87M7VUW\nI0nqrLLvE3ilASBJE0/ZM4G1EfFd4C7g13cEZeZ/r6IoSVJnlA2BB4qfPRYRy4EFwA5gaWau3sU6\n7wcWZOZp7W2SpOqUDYH37c3OI2IRMC8zF0bE8TSfQF7Yts4JwKm0nGFIkjpjT140v7XlZwswWGK7\nxTRHHiUz7wNmRcSMtnU+DCwrWYckaT8q+47hX4dFRBxE88v9hSU2nQ20dv9sLJb9tNjXecAtwM9L\n1itJ2o9KPyy2UzFUxI0R8RfAB/Zw856dExHRD5xPM1Ce29o2mv7+6fT2Tt7Dj5TGv02b+rpdgsap\n/v4+BgZmVrb/sg+LXdC26LnAc0psup7mkf9Oc4ANxfQrgMOB24CpwDER8eHMvHi0nTUaT5QpVzrg\nNBpD3S5B41SjMcTg4OZ92sfuQqTsmcCpLdPDwCbgDSW2WwVcAlwZEfOBdZk5BJCZ1wDXAETEUcBV\nuwsASdL+V/aawPk7p4sxhH6VmcMltrszItZExO3AdmBJcR3gscy8bm+LliTtH2O9Wewk4N2ZeXYx\nfzXwe8CvIuLMzPy/Y31AZrbf+XPvLtb5Oc3uIUlSB411i+gKmi+T2XnP/0uB36B5MfeyakuTJFVt\nrBCYlJnfLKZ/l+abxTZn5r9Q4m4eSdL4NlYIbG2ZPg347h5sK0ka58a6MPxkRJwJHALMpflgFxER\ngDfsS9IBbqwQeBvwt0A
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f944de630>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Graphical representation\n",
"# You can add the parameter estimator to change the estimator. (e.g. estimator=np.median)\n",
"# For example, estimator=np.size is you get the same chart than with countplot\n",
"#sns.barplot(x='Sex', y='Survived', data=df, estimator=np.size)\n",
"sns.barplot(x='Sex', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see now if men and women follow the same age distribution."
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f2f8e5c07b8>"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAADQCAYAAAC9dp7mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAElBJREFUeJzt3X+Q3HV9x/HnYdDG5hArpwFEBkl9m4LOmFib4UecEKTQ\nWp1gHFp1mmJBC9Ep/hhH0tpWijAlTUDaqh1U/DFKnaoNojANItiA0THQaVNb3ookaSHpeBEYAx0i\nSa5/fL8Hy5rbXfZ2bz+XfT5mMtnb7/e++9q7fd9rv/vjuyMTExNIklSiwwYdQJKkqVhSkqRiWVKS\npGJZUpKkYllSkqRiWVKSpGLNGXSAYRIR5wAfBPYB84D7gXdm5s8GGqyNiPhz4FmZ+WeDzqJD0yBn\nIyLOBdYCl2fm9X3Y/mvrbZ/e620PA/ekZkhEHA58HnhzZi7PzN8AtgN/ONBg0oAVMBu/BVzVj4Jq\n4BtSu+Se1MyZCzwXGAV+ApCZl04ujIhXAOuofieHA+8CtgHfB87OzG0RcT3w/cz8WMP3/R7wDp4a\nghFgV2a+pfHCI2Ib8HHgbOBo4P3AO4GFwGWZ+fmICODvgSeAI4A/zcxbm7azDJjco3oCuDAzd0zj\n5yINbDYi4k3AbwOnRsR+4J+Bj9WZ5gFrMvNb9fZ3U83LrwGXAr8DvBK4MzMvjojnAp8Dnl9fly9n\n5lWNVzQijmva/p9k5m3T+Nkd8kY84sTMiYgPAB8CNgN3UN2If1gv+3fgjfXAvRL4dGa+OiJeB7wH\n+CuqgfnNLi97G/DhzPxMPXDHZuZZ9UMR12Tmq+rT+zPzzohYAvxNZv765MN9wJXAvwJLMvORiHgD\n8PuZubL7n4o08Nm4HtiUmZ+OiK8DazPz2xHxIuC7wInAp4DDMnNVRKwC/hp4KdUdtYeA+cCvAKdm\n5hci4tlUhftiYDHwl5m5dKrtZ+aBbrIPA/ekZlBmXhUR1wFnAWcA342IS4GvAgF8KiJG6tXn1d9z\na0SsBD4LnDLNCN+p/38A+J+G00fUp3cBayPiCuDZwAuavv9kqr2wr9Y5DwMcLk1bAbMxaRkwLyIm\n773vBV5Yn76r/v8B4L8ycw9AROwGnkdVSksj4mLg58BzqIqrk+3/b4/yH3IsqRkUEXMz82HgS8CX\nIuIfqe6R3QDszcwzpvjW+cD/1f/vbNpmRw/31fZNcXpy+P8W+EJmfjYiTgJuavr+vcCOFjmlrhQw\nG5MeB1bUWRq3BVPPz+S2LwGenZmn1t8zfpDt7z3Y9jU1XzgxQyLiLGBzRMxrOPtE4L76FUzb6lc4\nEREvi4gP1adXUT0W/maqe5OHN243M2/IzGWZeUb9b1mbIWzlRcB/1qfPo7on2OiHwFF1gRERSyPi\nwi4vSwKKm407gd+tt39URFzdwVWYvJP35PzUD4XP5RdnaFMX2x9qltQMycyNwHXAbRHxrYi4g2rX\nf3W9yirg0oj4NnA9sDEijqZ6We57M/MHwNeBj3QZoZMnH9cBn4+IW6iG6aGIWDv5vZn5OPA2qj8I\ntwMfpnr+QOpaYbPxx8CKiPiXepvfPMg6U33/p4HzI+KbwPHAF+p/rbbviyba6OiFExFxMrABWN/4\n6pl62ZlUN459wC2ZeXk/gkqShk/bPan6ZZXX8tQ9imYfBVYApwFnRcTLexdPkjTMOnm473HgHKpX\nfj1NRJwA/DQzd2bmBHAzsLy3ESVJw6ptSWXmgczcO8Xi+UDjK1h+QvUSZUmSpq3XL0EfabfCxMTE\nxMhI29Wk2aDvN2TnRYeQrm7I0y2pnTx9z+lYmt6r0GxkZITx8T3TvNjeGRsbNU8LJeUpKQtUefrN\neZlaSVnAPO10Oy/P9CXoT2vC+phtoxHxkoiYA7we2NhVEkmSmrTdk4qIRVTvnzkeeKI+IOPXgG2Z\neSNwEfAPVO8FuCEz7+tjXknSEGlbUpl5D9Ub66Zafie9O26WJElP8ogTkqRiWVKSpGJZUpKkYllS\nkqRiWVKSpGJZUpKkYllSkqRiWVKSpGJZUpKkYllSkqRiWVKSpGJZUpKkYllSkqRiWVKSpGJZUpKk\nYllSkqRiWVKSpGJZUpKkYllSkqRiWVKSpGJZUpKkYs3pZKWIWA8sAQ4Al2TmloZlq4G3AvuALZn5\n3n4ElSQNn7Z7UhGxFFiQmacAFwDXNiwbBd4PnJqZS4GTIuI1/QorSRounTzctxzYAJCZ9wJHRsS8\netnPgb3AERExB5gLPNSPoJKk4dNJSc0Hxhu+3l2fR2buBS4D7ge2Ad/LzPt6HVKSNJw6ek6qycjk\nifrhvjXAAmAPcHtEvCIzt7bawNjYaBcX2z/maa2kPCVlmSmlXeeS8pSUBczTD52U1E7qPafaMcCu\n+vRC4MeZ+TBARGwCFgMtS2p8fM8zT9onY2Oj5mmhpDwlZYGZ+wNQ2nUuJU9JWcA87XQ7L5083LcR\nWAkQEYuABzPzsXrZdmBhRDyn/vrVwI+6SiJJUpO2e1KZuTki7o6Iu4D9wOqIWAU8kpk3RsRa4I6I\neAL4Tmbe1efMkqQh0dFzUpm5pumsrQ3LrgOu62UoSZLAI05IkgpmSUmSimVJSZKKZUlJkoplSUmS\nimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkopl\nSUmSimVJSZKKZUlJkoplSUmSimVJSZKKNaeTlSJiPbAEOABckplbGpa9GLgBOBy4JzMv7kdQSdLw\nabsnFRFLgQWZeQpwAXBt0yrrgLWZuQTYX5eWJEnT1snDfcuBDQCZeS9wZETMA4iIEeA04KZ6+bsz\n84E+ZZUkDZlOSmo+MN7w9e76PIAx4FHgmojYFBFX9DifJGmIdfScVJORptPHAlcD/w18IyLOycxb\nWm1gbGy0i4vtH/O0VlKekrLMlNKuc0l5SsoC5umHTkpqJ0/tOQEcA+yqT+8GtmfmdoCIuA04CWhZ\nUuPje55x0H4ZGxs1Twsl5SkpC8zcH4DSrnMpeUrKAuZpp9t56eThvo3ASoCIWAQ8mJmPAWTmfuD+\niDixXncxkF0lkSSpSds9qczcHBF3R8RdwH5gdUSsAh7JzBuB9wCfqV9EsTUzb+pvZEnSsOjoOanM\nXNN01taGZT8GTu9lKEmSwCNOSJIKZklJkoplSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkopl\nSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJkoplSUmSimVJSZKKZUlJ\nkoplSUmSijWnk5UiYj2wBDgAXJKZWw6yzpXAksxc1tuIkqRh1XZPKiKWAgsy8xTgAuDag6yzEDgd\nmOh5QknS0Ork4b7lwAaAzLwXODIi5jWtsw5Y0+NskqQh10lJzQfGG77eXZ8HQESsAm4HdvQ2miRp\n2HX0nFSTkckTEfF84Hyqva3jGpe1MjY22sXF9o95WispT0lZZkpp17mkPCVlAfP0QycltZOGPSfg\nGGBXffoM4ChgE/BLwEsjYl1mvq/VBsfH93QRtT/GxkbN00JJeUrKAjP3B6C061xKnpKygHna6XZe\nOnm4byOwEiAiFgEPZuZjAJn5lcw8uX5RxQrgnnYFJUlSp9qWVGZuBu6OiLuAa4DVEbEqIt7Y93SS\npKHW0XNSmdn8yr2tB1lnB9XDf5Ik9YRHnJAkFcuSkiQVy5KSJBXLkpIkFcuSkiQVy5KSJBXLkpIk\nFcuSkiQVy5KSJBXLkpIkFcuSkiQVy5KSJBXLkpIkFcuSkiQVy5KSJBXLkpIkFcuSkiQVy5KSJBXL\nkpIkFcuSkiQVy5KSJBVrTicrRcR6YAlwALgkM7c0LFsGXAHsAzIzL+hHUEnS8Gm7JxURS4EFmXkK\ncAFwbdMqnwDOzczTgSMi4uzex5QkDaNOHu5bDmwAyMx7gSMjYl7D8sWZuas+PQ68oLcRJUnDqpOS\nmk9VPpN21+cBkJmPAkT
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8e6c6208>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"g = sns.FacetGrid(df, col='Sex')\n",
"g.map(plt.hist, \"Age\", color=\"steelblue\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems they follow a similar distribution. We can separate per passenger class."
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "KeyError",
"evalue": "'Pclass'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 1875\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1876\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1877\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:4027)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:3891)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mKeyError\u001b[0m: 'Pclass'",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-88-ee652f27e1e2>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mg\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mFacetGrid\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcol\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'Sex'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrow\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'Pclass'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mg\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmap\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhist\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"Age\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcolor\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m\"steelblue\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/seaborn/axisgrid.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, data, row, col, hue, col_wrap, sharex, sharey, size, aspect, palette, row_order, col_order, hue_order, hue_kws, dropna, legend_out, despine, margin_titles, xlim, ylim, subplot_kws, gridspec_kws)\u001b[0m\n\u001b[0;32m 239\u001b[0m \u001b[0mrow_names\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 240\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 241\u001b[1;33m \u001b[0mrow_names\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mutils\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcategorical_order\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mrow\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrow_order\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 242\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 243\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mcol\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 1990\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1991\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1992\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getitem_column\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1993\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1994\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m_getitem_column\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m_getitem_column\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 1997\u001b[0m \u001b[1;31m# get column\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1998\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mis_unique\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1999\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_get_item_cache\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2000\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2001\u001b[0m \u001b[1;31m# duplicate columns & possible reduce dimensionality\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m_get_item_cache\u001b[1;34m(self, item)\u001b[0m\n\u001b[0;32m 1343\u001b[0m \u001b[0mres\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcache\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1344\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mres\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1345\u001b[1;33m \u001b[0mvalues\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_data\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1346\u001b[0m \u001b[0mres\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_box_item_values\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1347\u001b[0m \u001b[0mcache\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mres\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/core/internals.py\u001b[0m in \u001b[0;36mget\u001b[1;34m(self, item, fastpath)\u001b[0m\n\u001b[0;32m 3223\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3224\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[0misnull\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 3225\u001b[1;33m \u001b[0mloc\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3226\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3227\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marange\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0misnull\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python3.5/dist-packages/pandas/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 1876\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1877\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1878\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_maybe_cast_indexer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1879\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1880\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtolerance\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtolerance\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:4027)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:3891)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mKeyError\u001b[0m: 'Pclass'"
]
}
],
"source": [
"g = sns.FacetGrid(df, col='Sex', row='Pclass')\n",
"g.map(plt.hist, \"Age\", color=\"steelblue\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see there are more young men in third class. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Pclass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have already seen how passengers are distributed with Pclass"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Pclass\n",
"1 216\n",
"2 184\n",
"3 491\n",
"dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Pclass').size()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9406ba58>"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAETBJREFUeJzt3X+sX3V9x/Hn9V4tvbftepFrZUUalyXvhZAtYUaWVm0p\nAv5iGMp0Wjtm1flrScU5g3HYynAxLBCGzrAVO5BhNnUdsQMRGRFF2Ypx07m4t45F6m3ZepEv9gdt\nKbd3f5zPlXvLvbffW+75ntve5yNp+v2e7/me+7r35t7X/Zwfn9M1MjKCJEnPazqAJGl2sBAkSYCF\nIEkqLARJEmAhSJIKC0GSBEBPnRuPiJXAF4EfAF3A94E/B26jKqNHgXWZeTgi1gIbgGFgc2ZuqTOb\nJGm8rjqvQyiF8P7MfNOYZVuAf8rMrRHxCWAHVUF8F3gZ8DTwEPDKzHyitnCSpHE6scuo66jnq4Bt\n5fE24ALgXGB7Zu7LzIPAA8CKDmSTJBW17jIqzoqIO4BTgauB3sw8XF7bDZwOLAGGxrxnqCyXJHVI\n3SOEHwObMvONwO8Dn2V8CR09ejjWcklSTWodIWTmLqqDymTm/0TE/wIvi4h5mXkIWArsBHYxfkSw\nFHhwqm0//fTwSE9Pdz3BJenkNekf3HWfZfRW4PTMvC4iXky1a+hvgMuA24E1wN3AduDmiFgEHAGW\nU51xNKlW68k6o0vSSWlgYOGkr9V9ltEC4PPAYuD5wCbge8DngHnAI8DbM3M4Ii4FPkxVCDdm5t9N\nte2hob1O0ypJ0zQwsHDSEUKthVAnC0Ga3YaHhxkc3NF0jDnhjDPOpLu7vV3oUxVCJ84ykjQHDQ7u\n4L6NH+W0+fObjnJSe+zAAVZ//BMsW/bS57wtC0FSbU6bP58X9/Y1HUNtci4jSRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRYCJKkwkKQJAEW\ngiSpsBAkSYCFIEkqLARJEmAhSJKKnro/QEScAvwAuBq4D7iNqogeBdZl5uGIWAtsAIaBzZm5pe5c\nkqTxOjFCuAr4WXl8NfCpzFwJPAysj4jess5q4DzgiohY3IFckqQxai2EiAjg14A7gS5gJbCtvLwN\nuAA4F9iemfsy8yDwALCizlySpGere4RwHfBBqjIA6MvMw+XxbuB0YAkwNOY9Q2W5JKmDajuGEBHr\ngG9n5iPVQOFZuiZaOMXycfr7e+np6T7eeJJqtmdPX9MR5oz+/j4GBhY+5+3UeVD59cBLI+JiYCnw\nFLAvIuZl5qGybCewi/EjgqXAg8faeKv15MwnljRjWq39TUeYM1qt/QwN7W1r3amKo7ZCyMzfHX0c\nER8DfgIsBy4DbgfWAHcD24GbI2IRcKSss6GuXJKkiXXqOoTR3UAbgcsj4n6gH7i1HEi+Erin/NuU\nme1VnSRpxtR+HQJAZn58zNMLJ3h9K7C1E1kkSRPzSmVJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEQE+dG4+I+cAtwBJgHnAN8D3gNqoyehRYl5mHI2ItsAEYBjZn5pY6s0mSxqt7\nhHAx8FBmrgLeDFwPXA18OjNXAg8D6yOiF7gKWA2cB1wREYtrziZJGqPWEUJmfmHM0zOBnwIrgXeX\nZduADwE/ArZn5j6AiHgAWAHcWWc+SdIzai2EURHxLWAp1Yjha5l5uLy0GzidapfS0Ji3DJXlkqQO\n6chB5cxcAfw2cDvQNealronfMelySVJN6j6ofA6wOzMHM/P7EdEN7I2IeZl5iGrUsBPYxfgRwVLg\nwam23d/fS09Pd13RJT1He/b0NR1hzujv72NgYOFz3k7du4xeBSyjOki8BFgAfAW4jGq0sAa4G9gO\n3BwRi4AjwHKqM44m1Wo9WWNsSc9Vq7W/6QhzRqu1n6GhvW2tO1Vx1L3L6CbgRRHxDaoDyO8FNgKX\nR8T9QD9wa2YeBK4E7in/NmVme5+dJGlG1H2W0UFg7QQvXTjBuluBrXXmkSRNziuVJUmAhSBJKiwE\nSRJgIUiSCgtBkgRYCJKkwkKQJAFtFkJE3DLBsq/OeBpJUmOmvDCt3LTmPcDZ5WrjUS+gmqFUknSS\nmLIQMvP2iPg61bxDG8e8dAT4zxpzSZI67JhTV2TmTmBVRPwScCrPTE29GHi8xmySpA5qay6jiPgL\nYD3VjWtGC2EE+JWackmSOqzdye1WAwNlsjpJ0kmo3dNOf2wZSNLJrd0RwmA5y+gB4OnRhZn5sVpS\nSZI6rt1C+Bnwz3UGkSQ1q91C+NNaU0iSGtduITxNdVbRqBHg58ALZzyRJKkRbRVCZv7i4HNEvAA4\nH/iNukJJkjpv2pPbZeZTmfkV4IIa8kiSGtLuhWnrj1r0EmDpzMeRJDWl3WMIrxzzeATYA7xp5uNI\nkprS7jGEtwNExKnASGa2ak0lSeq4dncZLQduAxYCXRHxM+BtmfmdOsNJkjqn3YPKnwQuycwXZeYA\n8Bbg+vpiSZI6rd1CGM7MH4w+ycx/Y8wUFpKkE1+7B5WPRMQa4Gvl+WuA4XoiSZKa0G4hvAf4FHAz\n1d3S/h14V12hJEmd1+4uowuBQ5nZn5kvpLpJzuvqiyVJ6rR2C+FtwKVjnl8IvHXm40iSmtJuIXRn\n5thjBiM8cytNSdJJoN1jCF+OiG8D36QqkfOBf6gtlSSp49q9UvmaiPg6cC7V6OB9mfkvdQabKcPD\nwwwO7mg6xpxwxhln0t3d3XQMScep3RECmfkA1S00TyiDgzv4kxu+xCkLTm06yknt4L7HueYDl7Fs\n2UubjiLpOLVdCCeyUxacSu+igaZjSNKsNu37IUiSTk61jxAi4lrgFUA31ZxID1FNlPc84FFgXWYe\njoi1wAaqK6A3Z+aWurNJkp5R6wghIlYBZ2XmcuC1wA3A1cCnM3Ml8DCwPiJ6gauA1cB5wBURsbjO\nbJKk8ereZXQ/8Dvl8RNAH7AS+HJZto3qVpznAtszc19mHqQ6eL2i5mySpDFq3WWUmSPAgfL0HcCd\nwEWZebgs2w2cDiwBhsa8dagslyR1SEfOMoqIS4D1VFNe/PeYlya72tmroCWpwzpxUPki4CNUI4O9\nEbE3IuZl5iFgKbAT2MX4EcFS4MGpttvf30tPz7Evgtqzp++4s2t6+vv7GBhY2HQMzRL+7HXOTP3s\n1VoIEbEIuBY4PzN/XhbfC6wBPl/+vxvYDtxc1j8CLKc642hSrdaTbWVotfYfV3ZNX6u1n6GhvU3H\n0Czhz17nTOdnb6riqHuE8GbghcAXIqKLatqLy4HPRsS7gUeAWzNzOCKuBO6hKoRNmelvFknqoLoP\nKm8GNk/w0oUTrLsV2FpnHknS5LxSWZIEWAiSpGJOTG6nE5fTl3eGU5cLLATNcoODO9i09RPM7/cU\nxrocaO1n06UfdepyWQia/eb399F3mtc3SHXzGIIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIk\nwEKQJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQIsBElS\nYSFIkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIkAHrq/gARcTZw\nB3B9Zn4mIs4AbqMqo0eBdZl5OCLWAhuAYWBzZm6pO5sk6Rm1jhAiohe4Ebh3zOKrgU9l5krgYWB9\nWe8qYDVwHnBFRCyuM5s
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9417f2b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most passengers are in 3rd class."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9409a0f0>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF7VJREFUeJzt3X+U3XV95/HnZCZAMgnNAKMYY1IVfGOt6ImWE1GjEGtp\naxeq1NIqRS3uWcWtbO0vsVZslVo9REx/IdbIqlStUkGWH4WKPxCxx2TVZau8bXcrIQmriVxMMklg\nMpn9494hd4Yk853JfO6dyff5OGdO7vf3e+Y7mdf31+fz7RkdHUWSVE/zul2AJKl7DAFJqjFDQJJq\nzBCQpBozBCSpxgwBSaqxvtIbiIi1wCpgP3BpZm5ojV8KXAeMAj3A04A/zMxPl65JktRUNAQiYjVw\nSmaeGRGnAeuBMwEycytwVmu+XuBLwBdK1iNJGq/05aA1wA0AmXkfsCQiFh1kvtcB12fm7sL1SJLa\nlA6Bk4FtbcPbW+Mmuhj4aOFaJEkTdPrGcM/EERGxCvheZu7qcC2SVHulbwxvZfyR/1LgwQnzvAL4\n5yor27dvZLSvr3eGSpOk2njcAfiY0iFwO3A58JGIWAlsycyhCfP8HPCpKitrNLxlIElTNTi4+JDT\nil4Oysx7gI0RcTdwFXBJRFwUEee2zXYy8KOSdcwF69dfwwUXnMf69dd0uxRJNdIzl7qS3rZt59wp\ndgr27t3D61//m4yOjtLTM4+Pfew6jjtuQbfLknSUGBxcfMjLQbYYngWGh4cZC+PR0f0MDw93uSJJ\ndWEISFKNGQKSVGOGgCTVmCEgSTVmCEhSjRkCklRjhoAk1ZghIEk1ZghIUo0ZApJUY4aAJNWYISBJ\nNWYISFKNGQKSVGOGgCTVmCEgSTVW+h3DXTEyMsLmzZu6XUZlQ0PjX7v8wAOb6O/v71I1U7Ns2XJ6\ne3u7XYakaToqQ2Dz5k388VWf47hFJ3S7lEpGRx4dN7z2k3fR03tMl6qpbu+uh3jPpeezYsVTu12K\npGk6KkMA4LhFJ7Dw+MFul1HJ/n172dU2vGDxiczrO65r9UiqD+8JSFKNGQKSVGOGgCTVWPF7AhGx\nFlgF7AcuzcwNbdOWAZ8C5gP/MzPfXLoeSdIBRc8EImI1cEpmnglcDKybMMuVwAcycxUw0goFSVKH\nlL4ctAa4ASAz7wOWRMQigIjoAV4E3NSa/l8zc3PheiRJbUqHwMnAtrbh7a1xAIPALuCqiLgrIq4o\nXIskaYJOtxPomfD5ycAHgU3AzRHxi5l566EWHhhYSF/f5K1Td+yYG61tjwYDA/0MDi7udhmSpql0\nCGzlwJE/wFLgwdbn7cAPMvMHABHxReBZwCFDoNHYXWmjjcbQ5DNpRjQaQ2zbtrPbZUg6jMMdqJW+\nHHQ7cD5ARKwEtmTmEEBmjgD/NyKe3pr3eUAWrkeS1KbomUBm3hMRGyPibmAEuCQiLgIezswbgf8G\nXNu6SXxvZt5Ush5J0njF7wlk5mUTRt3bNu3/AC8uXYMk6eBsMSxJNWYISFKNGQKSVGOGgCTVmCEg\nSTVmCEhSjRkCklRjhoAk1ZghMBv0tHeK1zNhWJLKMQRmgXm981kw+EwAFgyexrze+V2uSFJddLor\naR3C8ctfwPHLX9DtMiTVjGcC0gxYv/4aLrjgPNavv6bbpUhTYghIR2jv3j3ccUfzNRh33HEbe/fu\n6XJFUnWGgHSEhoeHGR0dBWB0dD/Dw8NdrkiqzhCQpBozBCSpxgwBSaoxQ0CSaswQkKQaMwQkqcYM\nAUmqMUNAkmrMEJCkGivegVxErAVWAfuBSzNzQ9u0/wA2taaNAq/JzAdL1yRJaioaAhGxGjglM8+M\niNOA9cCZbbOMAudkpp2tSFIXlL4ctAa4ASAz7wOWRMSituk9rS9JUheUDoGTgW1tw9tb49pdHRF3\nRcQVhWuRJE3Q6ZfKTDzqfydwG/AQcGNEvDIz//FQCw8MLKSvb/JXL+7Y0X9ERaq6gYF+BgcXd7uM\nrjrmmP3jhk88cRE/9VP1/plo7igdAlsZf+S/FHjsxm9mfnLsc0TcAjwbOGQINBq7K2200Riaap2a\npkZjiG3bdna7jK7auXPXuOEf/3gXjz7qg3eaPQ53oFb6N/V24HyAiFgJbMnModbw8RFxW0SMvVD3\nJcD/LlyPJKlN0TOBzLwnIjZGxN3ACHBJRFwEPJyZN0bEzcA3ImI38K3MvL5kPZKk8YrfE8jMyyaM\nurdt2l8Cf1m6BknSwXnhUpJqzBCQpBozBCSpxjrdTkCa1MjICJs3b+p2GZUNDY1/JPmBBzbR3z/7\n26osW7ac3t7J293o6GYIaNbZvHkTl//je1kwMPv/kALsf3Rk3PC6r13NvGNm9x/XPY0hLn/lO1ix\n4qndLkVdZghoVlow0E//SXOj1e3II/t4uG144YmL6D3W/1qaG7wnIEk1VvlwJSKeCKxoDd6fmT8s\nU5IkqVMmDYGIeDXwduBJwAOt0csjYgvw55n52YL1SZIKOmwIRMS1rXlel5nfmTDtOcDvR8QvZ+br\nilUoSSpmsjOBz2fmjQeb0AqF10bEuTNfliSpEyYLgee2jvgPKjP/9FAhIUma/SYLgbHpp7a+vgr0\n0uz2+VsF65IkdcBhQyAz3wkQEV8AzsjMkdbwfOAz5cuTJJVUtZ3Acsa/GnKUA4+LSpLmqKrtBG4G\nvh8RG4H9wErghmJVSZI6olIIZOY7Wo+LPpvmGcG7M/O7JQuTJJVX6XJQRBwLvJzmfYHrgcURcVzR\nyiRJxVW9J/A3wNOBs1rDK4FrSxQkSeqcqiFwWmb+LrAbIDP/FlharCpJUkdUDYF9rX9HASKiH1hQ\npCJJUsdUDYHPRsQXgadFxDrg28B15cqSJHVC1aeD/ioi/gV4KfAIcEFmbixZmCSpvEohEBHfAD4O\nfDQzH5rKBiJiLbCKZvuCSzNzw0Hm+XNgVWaeNXGaJKmcqpeD3gacBnwrIm6MiPMj4pjJFoqI1cAp\nmXkmcDGw7iDzPBN4Ma37DZKkzqkUApl5d2b+DvDTwAeBc4AtFRZdQ6tlcWbeByyJiEUT5rkSuKxq\nwZKkmTOV10suAc4Dfg14GvDhCoudDLRf/tneGvfvrXVeBHwJuL9qHZKkmVP1nsA/Ac+ieVT/3sz8\n+jS391gndBExALye5tnCUxjfQd1BDQwspK+vd9KN7NjRP83yNFUDA/0MDi6e0XXOtf3XM6/tV7dn\nwvAsVmLfae6peibwIeC2zNw/xfVvpXnkP2Yp8GDr89nAScBdwHE0Hz+9MjPfdqiVNRq7K2200Ria\nYpmarkZjiG3bds74OueSefN7WfSME9j1/YdYdOoJzJs/+YHKbFBi32l2OlzYT/aO4Q9l5ltpvmj+\njyJi3PTMXD3Jtm8HLgc+EhErgS2ZOdRa9nrg+tZ2VgAfO1wASLPZwBlLGTjDRvSaeyY7E1jf+veP\np7PyzLwnIjZGxN3ACHBJ6z7Aw76WUpK6b7I3i32n9fEvaLYT+PRU2wlk5sQnf+49yDz307w8JEnq\noKr3BN4G/DrNdgLfBj4BfCEzHy1WmSSpuNLtBCRJs1jpdgKSpFlsqu0EPs+RtROQJM0iVc8EvgL8\nUmaOlCxGktRZVTuQe5kBIElHn6pnApsi4svAN4DHngjKzD8pUZQkqTOqhsB/tL4kSUeRqiHwZ0Wr\nkCR1RdUQ2Mf4l76MAj8BTpzxiiRJHVP1HcOP3UBuvVFsDfCcUkVJkjqj6tNBj8nMRzPzVuDnC9Qj\nSeqgqo3F3jBh1FOAJ898OZKkTqp6T+DFbZ9HgR3Aq2e+HElSJ1W9J/D6sc+tPoR+kpmjh1lEkjQH\nHPaeQEScHhGfbRu+juYrI7dGxBmli5MklTXZjeF1NF8mQ0SsBl4APJHm00FXlC1NklTaZCEwLzNv\nan3+FZpvFtuZmd8FesqWJkkqbbIQGG77fBbw5SksK0ma5Sa7MbwnIs4FjgeWA18CiIgAegvXJkkq\nbLIQeCvwt8AA8JuZORw
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9405ffd0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Survivors per class\n",
"sns.barplot(x='Pclass', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"As expected, passenger class is very significant, since most survivors are in first class.\n",
"\n",
"We can also see the distribution of classes per sex."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f2f94db5400>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWgAAAEZCAYAAAC6m7+xAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF7lJREFUeJzt3X2QXXWd5/F3k9aEJEQ6GJAJkmV06rtSrlOLjMwmSMKD\nMM4guIBSY2BYHhwdRxef1o01MmQcnGVxpRAfFieAIMNUiWyWkeVhgBFBhCK4oyJT7ldFF0hwoZGL\nSRoSQtL7xzkNl7bTaZJ7+vyafr+quvrc83S/fUN97o/fOb/f6RseHkaSVJ7d2i5AkjQ2A1qSCmVA\nS1KhDGhJKpQBLUmFMqAlqVD9TZ48InYHrgD2AWYC5wEnAW8Gnqh3+2xm3hQRy4Gzga3Aqsy8vMna\nJKl0fU3eBx0R7wb2z8z/FhH7A7cC3wWuzcwbu/abDfwzcDDwHHAf8NbMfKqx4iSpcI22oDPzmq6X\n+wOP1Mt9o3Y9BFiTmRsBIuIuYAlwQ5P1SVLJGg3oERHxXWAhcCzwMeDPI+KjwGPAh4DXAINdhwwC\n+05GbZJUqkm5SJiZS4DjgKuBrwErMvNI4AfAyjEOGd3ClqRpp9GAjoiDImI/gMy8n6rF/qN6GeB6\n4I3AOl7cYl4IPDreuZ97busw4I8//kztH42j6S6Ow4BFwEciYh9gLvCViPh4Zv4CWAY8AKwBLo2I\necA2YDHVHR3b1ek83WTd0i7bunUra9c+3GoN++23PzNmzGi1hvEsWLBH2yUUremAvgS4LCLuBGYB\nHwA2Al+PiKF6+fTM3BQRK4BbqAJ6ZWZuaLg2qVFr1z7Mpy66lllz57fy/ps2Psl5Hz6JRYsOaOX9\nteuavotjE7B8jE1vGWPf1cDqJuuRJtusufOZPW9B22VoinIkoSQVyoCWpEIZ0JJUKANakgplQEtS\noQxoSSqUAS1JhTKgJalQBrQkFcqAlqRCGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXK\ngJakQhnQklQoA1qSCmVAS1KhDGhJKpQBLUmFMqAlqVD9TZ48InYHrgD2AWYC5wE/BK6i+nL4JXBq\nZm6JiOXA2cBWYFVmXt5kbZJUuqZb0O8A7svMZcDJwIXAp4EvZuZS4EHgjIiYDZwDHAEcDnwkIvZs\nuDZJKlqjLejMvKbr5f7AI8BS4H31uuuBjwM/AdZk5kaAiLgLWALc0GR9klSyRgN6RER8F1hI1aK+\nNTO31JseB/al6gIZ7DpksF4vSdPWpFwkzMwlwHHA1UBf16a+sY/Y7npJmjaavkh4EPB4Zq7NzPsj\nYgawISJmZuZmqlb1OuBRXtxiXgjcM965BwZm098/o6nSpV22fv2ctktgYGAOCxbs0XYZ2klNd3Ec\nBiyiuui3DzAXuAk4iao1fSJwM7AGuDQi5gHbgMVUd3RsV6fzdINlS7uu0xlquwQ6nSEGBze0XcZ2\n+eUxvqa7OC4B9o6IO6kuCP4ZcC5wWkTcAQwAV2bmJmAFcEv9szIzy/2vSpImQdN3cWwClo+x6egx\n9l0NrG6yHkmaShxJKEmFMqAlqVAGtCQVyoCWpEIZ0JJUKANakgplQEtSoQxoSSqUAS1JhTKgJalQ\nBrQkFcqAlqRCGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXKgJakQhnQklQoA1qSCmVA\nS1KhDGhJKlR/028QERcAhwIzgPOB44A3A0/Uu3w2M2+KiOXA2cBWYFVmXt50bZJUskYDOiKWAQdm\n5uKImA98H/gnYEVm3ti132zgHOBg4DngvohYnZlPNVmfJJWs6S6OO4B31ctPAXOoWtJ9o/Y7BFiT\nmRszcxNwF7Ck4dokqWiNtqAzcxh4pn55FnADVRfGByPio8BjwIeA1wCDXYcOAvs2WZskla7xPmiA\niDgeOB04mqob41eZeX9EfAJYCdw96pDRLezfMDAwm/7+Gb0uVeqZ9evntF0CAwNzWLBgj7bL0E6a\njIuExwCfBI7JzA3A7V2brwe+DHwDeEfX+oXAPeOdt9N5useVSr3V6Qy1XQKdzhCDgxvaLmO7/PIY\nX6N90BExD7gAODYzf12vuzYiDqh3WQY8AKwBDo6IeRExF1gMfKfJ2iSpdE23oE8G9gKuiYg+YBj4\nKvD1iBgCNgKnZ+amiFgB3AJsA1bWrW1Jmraavki4Clg1xqarxth3NbC6yXokaSpxJKEkFcqAlqRC\nGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXKgJakQhnQklQoA1qSCmVAS1KhDGhJKpQB\nLUmFMqAlqVAGtCQVyoCWpEIZ0JJUKANakgplQEtSoQxoSSqUAS1JhTKgJalQ/U2/QURcABwKzADO\nB+4DrqL6cvglcGpmbomI5cDZwFZgVWZe3nRtklSyRlvQEbEMODAzFwNvBy4CPg18MTOXAg8CZ0TE\nbOAc4AjgcOAjEbFnk7VJUuma7uK4A3hXvfwUMAdYCnyzXnc98DbgEGBNZm7MzE3AXcCShmuTpKI1\n2sWRmcPAM/XLM4EbgGMyc0u97nFgX2AfYLDr0MF6vSRNW433QQNExPHAGcDRwM+6NvVt55DtrX/e\nwMBs+vtn9KA6qRnr189puwQGBuawYMEebZehnTQZFwmPAT5J1XLeEBEbImJmZm4GFgLrgEd5cYt5\nIXDPeOftdJ5uqmSpJzqdobZLoNMZYnBwQ9tlbJdfHuNr+iLhPOAC4NjM/HW9+jbgxHr5ROBmYA1w\ncETMi4i5wGLgO03WJkmla7oFfTKwF3BNRPQBw8BpwGUR8T7gIeDKzNwaESuAW4BtwMrMLPdrX5Im\nQdMXCVcBq8bYdPQY+64GVjdZjyRNJY4klKRCTSigI+KKMdb9Y8+rkSQ9b9wujnr49fuBN0bEnV2b\nXkl177IkqSHjBnRmXh0R3wauBs7t2rQN+JcG65KkaW+HFwkzcx2wLCJeBcznhUEkewJPNlibJE1r\nE7qLIyI+TzUScJAXAnoY+O2G6pKkaW+it9kdASyoJzKSJE2CiQb0Tw1nSVNVROwGfIHq5oYtwADw\nscws+lraRAN6bX0Xx13AcyMrM/MvG6lKknrrTcBrM/M4gIh4PfD6iDiFau6fVwL/Hbgf+AeqwXT/\nDviTzDy9nZInHtC/Av6pyUIkqUH/AmyKiMuAO6nm+hkEfjszT46I3YHbM/P3I+KvqZ7+9AZemDeo\nFRMN6L9utApJalA9B/27I2I+1QNC/gp4M1VoX05188Nz9b63RsTfANdm5sa2aoaJB/RzVHdtjBgG\nfk01EZIkFS0ilgJ71XP+3BQR9wO/AK7KzDPrff51/Xs5cB1wZET8XX2rcSsmFNCZ+fyQ8Ih4JXAk\n8LtNFSVJPfYD4EsRcRqwierxe28D3l5PZfEq4OaI2ACcTtUH/U3gb4E/aqVidmI2u8x8luob6ONU\n/TSSVLR6PvpTxth0xxjrjqp//4gWwxkmPlDljFGrXkt15VOS1JCJtqDf2rU8DKwH3t37ciRJIyba\nB306QH0FdDgzO41WJUmacBfHYuAqYA+gLyJ+BZySmd9rsjhJms4m+kSV84HjM3PvzFwA/DFwYXNl\nSZImGtBbM/OBkReZ+X26hnxLknpvohcJt0XEicCt9es/ALY2U5Kk6WrhwoUzgNf1+LQPrlu3rud5\nFRFfBb6RmTf2+twjJhrQ76eaCepSqqep/AB4b1NFSZq2XvdvDj8zZ82d35OTbdr4JD+6/bIAftKT\nE06yiQb00cDmzBwAiIhvAX8IfLGpwiRNT7Pmzmf2vAWT+p71CMOlwKuBA4FPUV1rewPVAJeTgd8D\nZgGXZOblXcfuRjXi8ADgFcC5mXl7L+qaaB/0KcAJXa+PBt7TiwIkqRCvr6cjPR9YAbyzXj4d+EVm\nHgYcxm9OHvce4NHMPBL498BFvSpooi3oGZnZ3YczzAuPvhpXRLyRauKRCzPzy3W/zZuBJ+pdPpuZ\nN9UTlJxN1be9qvsbSpI
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8ffd6198>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.factorplot('Pclass',data=df,hue='Sex',kind='count')"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Pclass Sex \n",
"1 female 0.968085\n",
" male 0.368852\n",
"2 female 0.921053\n",
" male 0.157407\n",
"3 female 0.500000\n",
" male 0.135447\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Pclass', 'Sex']).Survived.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most women in first class and second survived, 96% and 92% respectively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Fare"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to analyse the feature *Fare* and will take the opportunity to introduce how to manage outliers.\n",
"\n",
"As we see in the PairGrid chart, Fare is directly related to the Passenger class."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8ff4c1d0>"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEDCAYAAADZUdTgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFHZJREFUeJzt3X+QXWd93/H3SpGFtVqNZM0ixLoYOs18PZ5MZko1YypR\n60djuxSICRKBiUZ1ImiVjpNRjXHH+UO2UdNCSa1SG/KjCgLhkCnQUR00HhzjcXBlcCOnzNgwSb4Y\nG+R4JdC1R45XxpIvu9s/7jFdCUl7dffes7v3eb9mPHPuc87V83yt1ed59vy4d2BychJJUhkWzPYA\nJEn1MfQlqSCGviQVxNCXpIIY+pJUEENfkgryc9MdEBGDwOeBFcAlwG7gh8AfABPAk5l5U3XsrcCW\nqn13Zn61R+OWJHWgnZX+rwN/m5mbaAX6fwP+K/DbmfnPgOURcX1EvBn4VWAt8G5gT0QM9GTUkqSO\ntBP6zwMrq+2VwAvAWzLzW1XbQeBaYCPw1cwcz8zngR8AV3V3uJKkmZg29DPzi8AVEfEU8HXgVuDE\nlEOOA6uBVUBjSnujapckzRHThn5EbAWOZObPA5uAPznrkPOdwvHUjiTNMdNeyAXWAX8OkJnfjohL\nz3rfCDAKHAWuPKv96IX+4MnJycmBAecGSbpIHQdnO6H/PeBtwP+KiCuAMeD7EbEuM78BvBe4G3gK\n+HBE3A68HnhjZv71BUc9MECjMdbp2Oe84eEh65un+rk2sL75bnh4qOP3thP6fwTsi4ivAwuBHbRu\n2fzv1d05f5mZDwNExF7gEK1bNn+z41FJknpi2tDPzJeB959j1zXnOPbTwKe7MC5JUg/4RK4kFcTQ\nl6SCGPqSVBBDX5IKYuhLUkEMfUkqiKEvSQUx9CWpIIa+JBXE0Jekghj6klQQQ1+SCmLoS1JBDH1J\nKkg7n6ffM3d8/A8Zn1xUW39LL13IB7durq0/SZprZjX0/+qZV1kwVN93pw8PPFtbX5I0F3l6R5IK\nYuhLUkEMfUkqyLTn9CNiO7ANmAQGgH8CvB34A1pfgP5kZt5UHXsrsKVq352ZX+3RuCVJHWjni9H3\nAfsAIuIa4H3AJ4HfzsxvRcQXIuJ6IIFfBd4GrAAORcQDmTnZs9FLki7KxZ7euR34z8CbM/NbVdtB\n4FpgI/DVzBzPzOeBHwBXdWugkqSZazv0I2IN8CwwDpyYsus4sBpYBTSmtDeqdknSHHExK/0PAZ+r\ntgemtA/87KEXbJckzZKLeThrA/Bb1fbKKe0jwChwFLjyrPajMxlcty1atIDh4aFa+6y7v7r1c339\nXBtYX6naCv2IWA2MZeZPqtd/ExFrM/ObwHuBu4GngA9HxO3A64E3ZuZf92jcHWk2J2g0xmrrb3h4\nqNb+6tbP9fVzbWB9891MJrR2V/qraZ27f83NwB9FxADwl5n5MEBE7AUO0bpl8zc7HpUkqSfaCv3q\nTp13Tnn9N8A15zju08CnuzY6SVJX+USuJBXE0Jekghj6klQQQ1+SCmLoS1JBDH1JKoihL0kFMfQl\nqSCGviQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4kFcTQl6SCGPqSVJC2viM3\nIrYCtwJN4Hbg28C9tCaNY8C2zGxWx+0ExoG9mbmvJ6OWJHVk2pV+RFxGK+jXAu8C3gPsBu7JzPXA\n08D2iFgC7AI2ARuBmyNiea8GLkm6eO2s9H8J+Fpm/hj4MbAjIp4BdlT7DwIfAb4LHM7MkwAR8Siw\nDri/66OWJHWkndB/MzAYEX8GLAc+CizJzGa1/ziwGlgFNKa8r1G1S5LmiHZCfwC4DPgVWhPAX1Rt\nU/ef731zyqJFCxgeHqq1z7r7q1s/19fPtYH1laqd0P8R8M3MnACeiYgxoBkRizPzNDACjAJHOXNl\nPwI81u0Bz0SzOUGjMVZbf8PDQ7X2V7d+rq+fawPrm+9mMqG1c8vmg8CmiBiIiJXAUuAhYEu1fzPw\nAHAYWBMRyyJiKa0Lv4c6HpkkqeumDf3MPAr8T+D/0LooexNwB3BjRDwCrAD2Z+Yp4DZak8SDwJ2Z\n2b9TrSTNQ23dp5+Ze4G9ZzVfd47jDgAHujAuSVIP+ESuJBXE0Jekghj6klQQQ1+SCmLoS1JBDH1J\nKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4kFcTQl6SC\nGPqSVJBpvyM3ItYDXwa+AwwATwK/B9xLa9I4BmzLzGZEbAV2AuPA3szc16uBS5IuXrsr/a9n5qbM\n3JiZO4HdwD2ZuR54GtgeEUuAXcAmYCNwc0Qs78moJUkdaTf0B856vQE4WG0fBK4FrgYOZ+bJzDwF\nPAqs68YgJUndMe3pncpVEXEfcBmtVf6SzGxW+44Dq4FVQGPKexpVuyRpjmgn9J8C7szML0fEPwT+\n4qz3nf1bwHTts2bRogUMDw/V2mfd/dWtn+vr59rA+ko1behn5lFaF3LJzGci4ofAmohYnJmngRFg\nFDjKmSv7EeCx7g+5c83mBI3GWG39DQ8P1dpf3fq5vn6uDaxvvpvJhDbtOf2I+LWIuKXafgOt0zif\nBbZUh2wGHgAO05oMlkXEUmAtcKjjkUmSuq6d0ztfAf40Im4AFgE7gCeAz0fEvwGOAPszczwibgMe\nBCZonRLq36lWkuahdk7vnAR++Ry7rjvHsQeAA10YlySpB3wiV5IKYuhLUkEMfUkqiKEvSQUx9CWp\nIIa+JBXE0Jekghj6klQQQ1+SCmLoS1JBDH1JKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi\n6EtSQQx9SSpIO1+MTkS8DvgOsBt4GLiX1oRxDNiWmc2I2ArsBMaBvZm5rzdDliR1qt2V/i7ghWp7\nN3BPZq4Hnga2R8SS6phNwEbg5ohY3u3BSpJmZtrQj4gArgTuBwaA9cDBavdB4FrgauBwZp7MzFPA\no8C6noxYktSxdlb6dwEfphX4AIOZ2ay2jwOrgVVAY8p7GlW7JGkOueA5/YjYBnwzM4+0Fvw/Y+Bc\njRdon1WLFi1geHio1j7r7q9u/VxfP9cG1leq6S7kvhN4S0S8GxgBXgVORsTizDxdtY0CRzlzZT8C\nPNaD8c5IszlBozFWW3/Dw0O19le3fq6vn2sD65vvZjKhXTD0M/MDr21HxO3AD4C1wBbgC8Bm4AHg\nMPDHEbEMmKiO2dnxqCRJPXEx9+m/dsrmDuDGiHgEWAHsry7e3gY8WP13Z2b27zQrSfNUW/fpA2Tm\nR6e8vO4c+w8AB7oxKElSb/hEriQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4k\nFcTQl6SCGPqSVBBDX5IKYuhLUkEMfUkqiKEvSQUx9CWpIIa+JBXE0Jekghj6klSQab8jNyIuBT4H\nrAIWA78LPAHcS2vSOAZsy8xmRGwFdgLjwN7M3NejcUuSOtDOSv/dwOOZuQF4P7AH2A18KjPXA08D\n2yNiCbAL2ARsBG6OiOU9GbUkqSPTrvQz80tTXr4J+DtgPbCjajsIfAT4LnA4M08CRMSjwDrg/m4O\nWJLUuWlD/zUR8Q1ghNbK/2uZ2ax2HQdW0zr905jylkbVLkmaI9oO/cxcFxG/CHwBGJiya+A8bzlf\n+6xZtGgBw8NDtfZZd3916+f6+rk2sL5StXMh963A8cx8LjOfjIiFwFhELM7M07RW/6PAUc5c2Y8A\nj/Vi0J1qNidoNMZq6294eKjW/urWz/X1c21gffPdTCa0di7kXgPcAhARq4ClwEPAlmr/ZuAB4DCw\nJiKWRcRSYC1wqOORSZK6rp3Q/0Pg9RHxv2ldtP23wB3AjRHxCLAC2J+Zp4DbgAer/+7MzP6daiVp\nHmrn7p1TwNZz7LruHMceAA50YVySpB7wiVxJKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi\n6EtSQQx9SSqIoS9JBTH
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8ffa4ba8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df['Fare'].hist()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8feb4160>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8fe02e48>]], dtype=object)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEMCAYAAADHxQ0LAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHP9JREFUeJzt3XuQVeWZ7/FvC9JIAwViBwlGYmoyj0PNZOo4VJkBR5CJ\nOJbXBBytUIwJyYTJMRnG25SZDF445iTHHJgcNRlzUKJxtCY6RYyEaNAyGm8nGJ0RPck8KgqES6Ql\njVyE7qa7zx/v28fdm+7ea2/W2pdev09VV++9Lu96evXaz3r3u9613qbe3l5ERGT4O6bWAYiISHUo\n4YuI5IQSvohITijhi4jkhBK+iEhOKOGLiOTEyFoHkDdm1gO8ARyOk5qAze5+bu2iEklH0fE9AtgD\nfMXdnxhindnAne7+0epEmV9K+NXXC8x29521DkQkA/2ObzObCaw1s993990l1pOMKeFXX1P8OYKZ\nGXAnMInwv7ne3f81zusB/gG4HJgO/AHwHWAKcAhY7O4vZh69yND6Hd/u/pyZvQH8KfBjM/sr4KuE\nBP8L4K8LVzaz44C7gT8GjgXWuPu1cd4lwPWEbw6dwN+6+88Hm57lH9mo1IZfX74JPOzu04HPAXeZ\n2YjCBdz9D+LLHwJ3u7sBfwP8yMz0/5R6dCzQYWbTCMf4me5+KtACfLlo2S8CLXH+acBn4rcEgG8D\n58bPx38FLiwxXYooQdTGk2b2q4Kf7wK4+4XAirjMs8BoQg2+z4/j71OBVne/O673PNAGzESkjpjZ\nucBkwvE8D3jW3d+Osz8N/FPh8u6+Erg4vn4X+L/AR+Lst4EvmtnJ7v6cu19TYroUUZNObQzYhh8/\nHF81sxN4v02z8KT8u/h7AtBiZr+K75uAcYSmIJFae9LMDhOO3c3AX7j7e/G43tO3kLt3AoSWzMDM\nPgqsiM2bPcBJwOo4+0JgGfCimW0FroxNN4NNlyJK+LVxRBu+mY0EHgAWuPtPzWwUcJCBL2btAN6N\nX2FF6s1gnRLeIbTlA2Bm44Djipa5Hfhl/LaLmT3TN8Pd3wIWx+mXA/cDJw02PbW/ZhhRk079aAHG\nAH0XXv8O6CDU3Ptx9y3ANjObD2BmJ5jZ/fGCl0itDdgpAfgJMMvMTjazJuAOYqIu8AHg3wHM7Gzg\n94Cx8RhfH08SEC749pjZpIGmp/nHDCdK+NU3YPez2F55C/AfZvYi8DrwEKFnw5gB1rsM+JKZ/Rp4\nEnjM3Q9mFrVIMoN2r3T37cAXgJ8B/wl0AyuLFrsZWGlmG4E/A26KPx8FHgFeMLNXCbX4xbGr56PF\n01P9i4aRplLPwzezFuD7wERgFLAc+C3wz4Qz6UZ3vyIuey2wIE5f7u6PZBe6SDbijUAPAq8Saqsb\nCb1L7iVUknYCi9y9y8wWAksJyWuVu68euFSR2kuS8K8APujuXzWzEwln5x3Ate7+kpndRzghOOFD\n8nHCyeFpYLq764YKaSgx4V/h7n9ZMG018GN3X2NmXwO2Ek4ALwEzCHeWvgD8mbvvGaBYkZpL0qTz\nDu/3/pgE7AZOcfeX4rS1wNnAWcAj7t7t7u8Qrs7roqI0quJ26DmEYx3eP+ZPBza4+353PwQ8A8yq\nWoQiZSqZ8N39B8A0M3ud0FZ8LdBesMguQl/xyYS+4H3a6N+HXKSRTDezh8zs52b2CWCMu3fFeTrm\npSGVTPixjXJLfLDRXOBfihYZ7Ir8YNNF6t3rwI3ufjHwGeAu+ndh1jEvDSlJk84s4KcA7v4Kod/s\nCQXzpwLbCe36U4qm7xiq4N5wAUE/+jnan1S5+w53fzC+fpPQSWGimTXHRXTM66fWPxVJcuPVG4QL\nsT+Mz8LYB7xlZrPc/VngU8CthFrRVWZ2PaEv7Qfd/VeDFQrQ1NREW9u+SmMvqbV1XKblV2MbKj/Z\nNtJkZp8Gprj7ithRYTLwPUIPtPuA+YSugBuAO81sPKFn2kxCj51BZX3MV6Ia/6NyKJ7SKj3mkyT8\n7wKrzexJwtPolhBqPP873jzxi75nXZvZKkLvnB7CA71EGtHDwP1mdhHhwV9LgJeB75vZF4AtwD3u\n3m1m1wHrCcf8je5eX5lBpEDJbpkZ6x0OtctG/hsavfy4jUZqO8/0mK9EvdVg8xBPd3c327ZtrXj9\nGTM+VtExr2fpiIhU2bZtW7l65TpGtZT/vMPOA7t56v6PVbRdJXwRkRoY1TKJ0eMnV3WbepaOiEhO\nKOGLiOSEEr6ISE4o4YuI5IQSvohITijhi4jkhBK+iEhOKOGLiOSEEr6ISE4o4YuI5IQSvohITijh\ni4jkRE0fnnbDN+6gu/fYxMt/6MQJfOr8eRlGJCIyfNU04f/yzU6OGZd8zOdDnW9nGI2IyPCmJh0R\nkZxQwhcRyQklfBGRnCjZhm9mi4FFQC/QBPwJcAbwz4SBmze6+xVx2WuBBXH6cnd/JKO4RUSkTCUT\nvruvBlYDmNmZwCXAt4Avu/tLZnafmZ0DOPCXwMeBicDTZvaou9d0lHQREQnKbdK5HvgfwIfd/aU4\nbS1wNnAW8Ii7d7v7O8BmYHpagYqIyNFJnPDNbAawFegG2gtm7QKmAJOBtoLpbXG6iIjUgXJq+J8H\n7o6vmwqmNx256JDTRUSkBsq58WoO8KX4elLB9KnAdmAHcGrR9B1HE1yx5uaRtLaOK2udcpevRNbb\nUPkikoZECd/MpgD73P1wfP9rM5vp7s8BnwJuBV4HrjKz64EPAB9091+lGWxHx2Ha2vYlXr61dVxZ\ny1ci622o/GTbEJHSktbwpxDa6vtcCXzXzJqAX7j7EwBmtgp4mtAt82/SDFRERI5OooQfe+ScV/D+\n18CZAyz3beDbqUUnIiKp0Z22IiI5oYQvIpITSvgiIjmhhC8ikhNK+CIiOaGELyKSE0r4IiI5oYQv\nIpITNR3EXKSemdlo4FVgOfAEcC+hkrQTWOTuXWa2EFhKeIrsqjh+hEhdUg1fZHDLgN3x9XLgNnef\nDWwCFpvZmLjMXMJ4EFea2YSaRCqSgBK+yADMzAhPf11HeNT3bMJgP/D+oD+nAxvcfb+7HwKeAWbV\nIFyRRJTwRQa2AriK98d1aHH3rvhag/5IQ1IbvkgRM1sEPOfuW0JF/whHNehPPT7Oud5iGu7x7N3b\nkmp5SSnhixzpPOAUM7uAMJBPJ7DfzJrdvYP+g/4U1uinAs+XKjzr8QHKVY0xC8qRh3ja2w+kWl5S\nSvgiRdz9sr7XcUCfzcBMYAFwHzAfeBTYANxpZuMJY0DMJPTYEalLasMXGVpfM80NwOVm9hQwEbgn\nXqi9Dlgff2509/qpmooUUQ1fZAjuflPB23kDzF8DrKleRCKVUw1fRCQnlPBFRHIiUZNOvH38WqAL\nuB54Bd1mLiLSUErW8M3seEKSnwmcD1yMbjMXEWk4SWr4nwAec/f3gPeAJWb2JrAkzl8LXAO8RrzN\nHMDM+m4zX5d61CIiUrYkCf/DQIuZ/QiYANwEjNFt5iIijSVJwm8Cjgc+SUj+P6P/LeRHdZt5OZqb\nR5Z9i3M1btHOehsqX0TSkCThv014rkgP8KaZ7QO60rrNvBwdHYfLusW5GrdoZ70NlZ9sGyJSWpJu\nmeuBuWbWZGaTgLHA44TbzKH/beYzzGy8mY0lXOR9OoOYRUSkAiUTvrvvAP4N+D+EC7BXoNvMRUQa\nTqJ++O6+ClhVNFm3mYuINBDdaSsikhNK+CIiOaGELyKSE0r4IiI5oYQvIpITSvgiIjmhhC8ikhNK\n+CIiOaGELyKSE0r4IiI5oYQvIpITSvgiIjmhhC8ikhNK+CIiOaGELyKSE0r4IiI5oYQvIpITSvgi\nIjmhhC8ikhMlx7Q1s9nAg8CrQBOwEfgmcC/hhLETWOTuXWa2EFgKdAOr3H11VoGLiEh5ktbwn3T3\nue5+lrsvBZYDt7n7bGATsNjMxgDLgLnAWcCVZjYhk6hFRKRsSRN+U9H7OcDa+HotcDZwOrDB3fe7\n+yHgGWBWGkGKiMjRK9m
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8feba390>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(['Fare','Pclass'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the distribution is right sweked. We are going to detect outliers using a box plot"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fd941d0>"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEDCAYAAADKhpQUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAD9JJREFUeJzt3XuMXGd5x/Hv2psary/yhq6MMaUXIT0IVVUaJUpltzhx\nGyPSUqtxGqK4FsKlChJS3LQxoiihxrSo4qbKoP6TxDS1EqEgRRg3KDJR0og0ETaNwuWPPjRBTVvb\n4CUs7Tq+sOxO/5gxOl5mdmc3s549734/kpV3zjkz5/nD+c3j95zzzkCj0UCSVIZl/S5AktQ7hrok\nFcRQl6SCGOqSVBBDXZIKYqhLUkEGuzkoInYCe4EJ4CPAt4FDNL8UTgG7MnOiddweYBK4LzMPLkjV\nkqS2Bma7Tz0irgSeA34TWAPsB64A/jkzH42IvwX+i2bIPw9cA/wUOA78Tmb+eOHKlyRVddOp/x7w\n1cw8C5wF7oiI7wF3tPYfAe4Gvgscy8wzABHxDLAZeKznVUuS2uom1H8FWBURh4F1wEeBocycaO0/\nDWwA1gOjlfeNtrZLki6TbkJ9ALgS+COaAf9Ua1t1f6f3SZIuo25C/QfAs5k5BXwvIsaBiYhYkZkX\ngI3ACeAkl3bmG2nOxXf0059ONgYHl8+vcklaujo2zd2E+lHg8xHxCZod+2rgceAW4CFgR+v1MeD+\niFgLTAGbaN4J09HY2NluipckVYyMrOm4b9a7XwAi4s+A9wEN4GPAN2je7bICeBl4b2ZORsTNwAdp\nhvqBzPzCTJ87OjruEpGSNEcjI2s6dupdhfpCMdQlae5mCnWfKJWkghjqklQQQ12SCmKoS1JBDHVJ\nKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SC\nGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJemOXToIIcOHex3GdK8GOrSNE8//SRP\nP/1kv8uQ5sVQlyoOHTrI1NQUU1NTduuqJUNdqqh26HbrqqPB2Q6IiC3AF4HvAAPAt4BPAodofimc\nAnZl5kRE7AT2AJPAfZlpqyNJl1G3nfq/ZObWzLwhM/cA+4HPZuYW4CVgd0QMAfcCW4EbgLsiYt2C\nVC0tkC1btrYdS3XRbagPTHt9PXCkNT4C3AhcBxzLzDOZeR54BtjciyKly2XXrt0sW7aMZcuWsWvX\n7n6XI83ZrNMvLW+LiC8BV9Ls0ocyc6K17zSwAVgPjFbeM9raLtWKHbrqrJtQ/w9gX2Z+MSJ+DXhq\n2vumd/GzbZcWNTt01dmsoZ6ZJ2leKCUzvxcR3weuiYgVmXkB2AicAE5yaWe+EXhups8eHh5icHD5\nfGuXJE3Tzd0vtwMbMvPTEfEGmtMsnwduAR4CdgCPA8eA+yNiLTAFbKJ5J0xHY2NnX1v1krQEjYys\n6bhvoNFozPjmiFgNPAysA64A9gHfBP4JWAG8DLw3Mycj4mbggzRD/UBmfmGmzx4dHZ/55JKknzMy\nsqbj9Pasob6QDHVJmruZQt0nSiWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQl\nqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIK\nYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCjLYzUER8TrgO8B+4EngEM0vhFPA\nrsyciIidwB5gErgvMw8uTMmSpE667dTvBV5pjfcDn83MLcBLwO6IGGodsxW4AbgrItb1ulhJ0sxm\nDfWICOCtwGPAALAFONLafQS4EbgOOJaZZzLzPPAMsHlBKpYkddRNp/5p4C9oBjrAqsycaI1PAxuA\n9cBo5T2jre2SpMtoxjn1iNgFPJuZLzcb9p8z0G7jDNsvMTw8xODg8m4OlSR1YbYLpb8P/GpEvAvY\nCPwEOBMRKzLzQmvbCeAkl3bmG4HnZjv52NjZeRUtSUvZyMiajvtmDPXMvO3iOCI+AvwnsAm4BXgI\n2AE8DhwD7o+ItcBU65g9r7FuSdIczeU+9YtTKn8NvCcingaGgQdbF0c/BBxt/dmXmeM9rVSSNKuB\nRqPRt5OPjo737+SSVFMjI2s6Xrf0iVJJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtS\nQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXE\nUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVZHC2AyJiJfCPwHpgBfA3wDeBQzS/\nFE4BuzJzIiJ2AnuASeC+zDy4QHVLktroplN/F3A8M68H3g18BtgPfC4ztwAvAbsjYgi4F9gK3ADc\nFRHrFqRqSVJbs3bqmflI5eWbgf8GtgB3tLYdAe4Gvgscy8wzABHxDLAZeKyXBUuSOps11C+KiH8F\nNtLs3L+amROtXaeBDTSnZ0YrbxltbZckXSZdh3pmbo6I3wAeAgYquwY6vKXT9p8ZHh5icHB5tyVI\nkmbRzYXSq4HTmfk/mfmtiFgOjEfEisy8QLN7PwGc5NLOfCPw3EyfPTZ2dv6VS9ISNTKypuO+bi6U\nvh34S4CIWA+sBp4Abmnt3wE8DhwDromItRGxGtgEfG3+ZUuS5mqg0WjMeEBEvA54APgl4HXAPuDf\naN7SuAJ4GXhvZk5GxM3AB4Ep4EBmfmGmzx4dHZ/55JKknzMysqbj9Pasob6QDHVJmruZQt0nSiWp\nIIa6JBXEUJekghjqklQQQ12SCmKoS9Pcc89e7rlnb7/LkOal62UCpKXi5MkT/S5Bmjc7dami2qHb\nrauODHWpotql27Grjgx1SSqIoS5VvPGNG9uOpbpw7Rdpmt27bwfg4MGH+1yJ1J5rv0jSEmGoSxVH\nj36l7ViqC0Ndqjh8+NG2Y6kuDHVJKoihLlVs335z27FUF4a6VLFt202sXDnEypVDbNt2U7/LkebM\ntV+kaYaHh/tdgjRvhro0jcsDqM6cfpEqDhz4VNuxVBeGulTxwgvPtx1LdWGoS1JBDHWpYtWqVW3H\nUl0Y6lLFm9705rZjqS4Mdali+/YdbcdSXRjqUoULeqnuDHWpwrtfVHddPXwUEZ8AfhtYDvwdcBw4\nRPNL4RSwKzMnImInsAeYBO7LzIMLUrUkqa1ZO/WIuB54W2ZuAt4J/D2wH/hcZm4BXgJ2R8QQcC+w\nFbgBuCsi1i1U4dJCuOqqq9uOpbroZvrlaeCPW+MfA6uALcCXW9uOADcC1wHHMvNMZp4HngE297Zc\naWHdeefdbcdSXcw6/ZKZDeBc6+WfAo8B78jMida208AGYD0wWnnraGu7VCt26Kqzrhf0iojtwG5g\nG/BiZVenH0Dt+MOoFw0PDzE4uLzbEqTL4mMf+2i/S5DmrdsLpe8A/opmhz4eEeMRsSIzLwAbgRPA\nSS7tzDcCz830uWNjZ+dXtSQtYSMjazru6+ZC6VrgE8AfZOb/tjY/AVx8MmMH8DhwDLgmItZGxGpg\nE/C111C3JGmOuunU3w28HngkIgaABvAe4IGIuAN4GXgwMycj4kPAUWAK2JeZ4wtUtySpjYFGo9G3\nk4+Ojvfv5JJUUyMjazpes/SJUkkqiKEuSQUx1CWpIIa6JBXEUJemOXr0Ky67q9rq+olSaak4fPhR\nALZtu6nPlUhzZ6cuVRw9+hXOnTvLuXNn7dZVS4a6VHGxS58+lurCUJekghjqUsX27Te3HUt1YahL\nFdWLo14oVR0Z6lLF3r13th1LdWGoSxWvvPLDtmOpLgx1SSqIoS5JBTHUpYqVK4fajqW6MNSlCm9p\nVN0Z6lLFD37w/bZjqS4MdaniqaeeaDuW6sJQl6SCGOqSVBBDXaq46qqr246lujDUpYo777y77Viq\nC0Ndqjhw4FNtx1JdGOp
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fda1e48>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.boxplot(data=df['Fare'])"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fd53f28>"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEDCAYAAADKhpQUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEAlJREFUeJzt3X9s3Hd9x/GnE6fpnDiKC7eQeSKbYHujapqgqsqUUPJj\nA4QYMJZKTIuyrilStCEayrItm1SSRttUMVpBi6axtulCVTaxgdiybiWtSn+NDndCg/YP3oWwhZFk\nyW0yk42X4iTeH3cB+/CPs3vn8338fEjW9/y97933/Yf9urc/38/3456JiQkkSWVY0ekCJEmtY6hL\nUkEMdUkqiKEuSQUx1CWpIIa6JBWkt5mDImIX8LvAOPBh4HngQWofCmeA3Zk5Xj9uH3ARuDczj7Sl\naknStHrmmqceEVcBzwJvAPqBw8Aq4B8y83MR8cfAt6mF/FeAa4ELwHPA9Zn53faVL0marJlO/ZeA\nRzNzDBgD9kbEt4C99eePAfuBF4GhzBwFiIhngC3Awy2vWpI0rWZC/aeANRHxd8B64HagLzPH68+f\nAzYCG4DqpNdV6/slSYukmVDvAa4C3kMt4L9Y3zf5+ZleJ0laRM2E+lngS5l5CfhWRIwA4xGxOjNf\nAgaBU8Bppnbmg9TG4md04cLFid7elQurXJKWrxmb5mZC/TjwQER8hFrHvhZ4BLgBeAjYWf9+CLgv\nItYBl4DN1GbCzGh4eKyZ4qVFV6n0U62OdLoMaVqVSv+Mz805Tz0zTwN/C/wLtYue7wcOAjdGxJPA\nAHA0M88DB6h9CBwHDmWmvxWStIjmnNLYTtXqiOv+akmyU9dSVqn0zzj84h2lklQQQ12SCmKoS1JB\nDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQ\nl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBejtd\ngLTU7Lv7aUbGxunvW8XHb7m+0+VI8zJnqEfEVuBvgBeAHuBrwJ8CD1Lr9M8AuzNzPCJ2AfuAi8C9\nmXmkXYVL7TIyNj5lK3WTZodfnsjMHZm5PTP3AYeBezJzK3AC2BMRfcBtwA5gO3BrRKxvS9VSG/X3\nrZqylbpJs8MvPQ3fbwP21h8fA/YDLwJDmTkKEBHPAFuAh19+mdLi+fgt11Op9FOtjnS6FGnemg31\nqyPi88BV1Lr0vsy8/LfpOWAjsAGoTnpNtb5fkrRImhl++QZwKDN/BfhN4H6mfhg0dvFz7Zcktcmc\nnXpmnqZ2oZTM/FZE/BdwbUSszsyXgEHgFHCaqZ35IPDsbO89MNBHb+/KhdYutVWl0t/pEqR5a2b2\ny68DGzPzzoh4FbVhlgeAG4CHgJ3AI8AQcF9ErAMuAZupzYSZ0fDw2MurXmoTx9S1lM3WcPRMTEzM\n+uKIWAt8GlgPrAIOAV8FPgWsBk4CN2XmxYj4VeD3qIX63Zn517O9d7U6MvvJpQ4x1LWUVSr9Mw5v\nzxnq7WSoa6ky1LWUzRbqLhMgSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkF\nMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBD\nXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklSQ3mYOiogrgReAw8DjwIPUPhDOALsz\nczwidgH7gIvAvZl5pD0lS5Jm0mynfhvwP/XHh4F7MnMrcALYExF99WN2ANuBWyNifauLlSTNbs5Q\nj4gAXgc8DPQAW4Fj9aePAW8B3ggMZeZoZp4HngG2tKViSdKMmunU7wQ+RC3QAdZk5nj98TlgI7AB\nqE56TbW+X5K0iGYdU4+I3cCXMvNkrWH/ET3T7Zxl/xQDA3309q5s5lBp0VUq/Z0uQZq3uS6UvgP4\n6Yh4JzAIfB8YjYjVmflSfd8p4DRTO/NB4Nm5Tj48PLagoqV2q1T6qVZHOl2GNK3ZGo5ZQz0zf+3y\n44j4MPAfwGbgBuAhYCfwCDAE3BcR64BL9WP2vcy6JUnzNJ956peHVA4CN0bEk8AAcLR+cfQAcLz+\ndSgzbXMkaZH1TExMdOzk1epI504uzcLhFy1llUr/jNctvaNUkgpiqEtSQQx1SSqIoS5JBTHUJakg\nhrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKo\nS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBemd64CI+DHg\nL4ENwGrgj4CvAg9S+1A4A+zOzPGI2AXsAy4C92bmkTbVLUmaRjOd+juB5zJzG/Be4C7gMPCJzNwK\nnAD2REQfcBuwA9gO3BoR69tStSRpWnN26pn5mUnfvhr4T2ArsLe+7xiwH3gRGMrMUYCIeAbYAjzc\nyoIlSTObM9Qvi4h/Bgapde6PZuZ4/alzwEZqwzPVSS+p1vdLkhZJ06GemVsi4ueBh4CeSU/1zPCS\nmfb/wMBAH729K5stQVpUlUp/p0uQ5q2ZC6XXAOcy8zuZ+bWIWAmMRMTqzHyJWvd+CjjN1M58EHh2\ntvceHh5beOVSG1Uq/VSrI50uQ5rWbA1HMxdK3wz8DkBEbADWAo8BN9Sf3wk8AgwB10bEuohYC2wG\nnl542ZKk+Wom1P8c+PGIeIraRdHfAg4CN0bEk8AAcDQzzwMHgOP1r0OZaasjSYuoZ2JiomMnr1ZH\nOndyaRYOv2gpq1T6Z7xm6R2lklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpII0vaCX\ntFzsuePxHzw+cmBHByuR5s9OXZIKYqhLUkFc+0Wahmu/aClz7RdJWiYMdUkqiLNfpAa3PzDEybOj\nbNqwloM3XdfpcqR5sVOXGpw8OzplK3UTQ11qsGnD2ilbqZs4+0WahrNftJQ5+0WSlglDXZIKYqhL\nUkGc0ig12PvRJxi/cIlVvSv45P5tnS5Hmhc7danB+IVLU7ZSNzHUpQYreqZupW5iqEsNVqyo/Vqs\nXOmvh7qPP7VSgx3XDHJF7wq2v2Gw06VI8+bNR9I0vPlIS9lsNx85+0Vq4OwXdbOmQj0iPgK8CVgJ\n3AE8BzxIbfjmDLA7M8cjYhewD7gI3JuZR9pStdRGzn5RN5tzTD0itgFXZ+Zm4O3Ax4DDwCcycytw\nAtgTEX3AbcAOYDtwa0Ssb1fhUrus6l0xZSt1k2Y69SeBL9cffxdYA2wF9tb3HQP2Ay8CQ5k5ChAR\nzwBbgIdbWbDUbp/cv80xdXWtOUM9MyeA/6t/ezO1kH5bZo7X950DNgIbgOqkl1br+yVJi6TpC6UR\n8W5gD/BW4JuTnprpKuyct24MDPTR27uy2RKkRVWp9He6BGnemr1Q+jbgD6h16CMRMRIRqzPzJWAQ\nOAWcZmpnPgg8O9v7Dg+PLaxqqc0cftFSNlvD0cyF0nXAR4Bfzsz/re9+DNhZf7wTeAQYAq6NiHUR\nsRbYDDz9MuqWJM1TM536e4FXAJ+JiB5gArgRuD8i9gIngaOZeTEiDgDHgUvAocy01ZGkReQdpdI0\nHH7RUua/s5OkZcJQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQXxn2RIDW5/YIiTZ0fZtGEtB2+6\nrtPlSPNipy41OHl2dMpW6iaGutRg04a1U7ZSN3GZAGkaLhOgpcxlAiRpmfBCqdRgzx2P/+DxkQM7\nOliJNH926pJUEENdkgpiqEtSQQx1qYFTGtXNnNIoTcMpjVrKZpvS6OwXqcG+u59mZGyc/r5VfPyW\n6ztdjjQvDr9IDUbGxqdspW5iqEtSQQx1qcGq3hVTtlI38UKpNA0vlGopc+0XSVomnP0iNdj70ScY\nv3CJVb0r+OT+bZ0uR5o
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fd679b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We can see the same with matplotlib.\n",
"# There is a bug and if you import seaborn, you should add 'sym='k.' to show the outliers\n",
"df.boxplot(column='Fare', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since Fare depends on Pclass, we are going to show outliers per passenger class."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([('Fare',\n",
" <matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fdb73c8>)])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEdCAYAAADkeGc2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGbVJREFUeJzt3X2UXXV97/H3JAPBwNBEHSNGpC3aL9Jer6KCN6h5QEGu\nRazhYpFGarBifQJatKG3CLKs5WJl8VSqBSIPym1BqBJBjTTyaK4RXSq0+gXjNcrDhaFGTYjBSTL3\nj72HnIkzmZPMmTlz5vd+rZV1zuyzz97fczZ8zm//9m/v3TUwMIAkqSzT2l2AJGniGf6SVCDDX5IK\nZPhLUoEMf0kqkOEvSQXqbncB6nwRcQBwH3BvPWkG8MHM/HoLlv1p4IbMvHWE118DfD8znxjrulot\nIvYG7s/M32mYdgDwucx85S4u69PAy4EngOnAo8DJmfnkMPPu1jpUFlv+apUfZOaizFwELAM+PEHr\nXQrMmaB17aouYLgTaXb35Jpl9Xc8H/gP4NSdzOsJPNopW/5qla6G588FHgKIiLnAcmBPYCtwMvAC\nqj2DN0XEq4EzgfcANwAPAL8HrMnM9w0uMCK6gX8Cfrde1tlUAfdm4OCIWJyZDzXM+5l6PauB4zNz\n/4j4GnB//b6/Aa4CZlH9f/CBzPxORPRlZm+9nBuAS4CFwPPr5T23rn1lRPwR8JdAP3BvZn4wInqA\nG6n2fu4Z4bvaMyKuAQL4NtWP5ZrMjHq9bwMOycwzdvJ9fwP443r+DwGL6+/3TODHDd/b24D3A1uA\nf8/Md0fE/vX3s6X+7H9Szz5kWmb+dCfrV4ez5a9WiYhYFRGrgb+v/wGcC1yRmQuBfwQ+kpl3AU9E\nxOuAvwXeW8/7EuBDmXko8MqIeEnD8k8AfpWZC6iC7tLMvA34DvCng8FfewMwIzPnAauA/Rpeuy8z\nP0DVal5d76mcDlxYvz5Si/l5mXkUcCJwXt2l8zfAwvqzvSAi5lEF6X116/w7IyzrYKpW/GHAIcD+\nwHcj4lX168cC143w3kFvBNZExAuBt9TLWlLX12gmcFRmvgZ4cUT8PnAcsDIzj6i/h/1GmKYpzPBX\nqwx2+/w34Ejg+oiYDrwCuL2e52vAS+vnH6Jqya/MzB/X0x7IzEfq59+gahkPhvHTy8nMR4HNETG7\nfq1xrwPgxWxvdd9K1ZodtGaY5X0LOHCEZQ36t3re+4HnAb9PtSfwlXqP4oX13wcDg8c6bv/NxQDw\nYMPnvJdqT+ca4ISI2AP47cz89jDv+7v6B/Z2YDNwOfAyqu+KzFybme/a4T3rgZvr9xwEPAtYCZwU\nER8H9srMNfW0t+8wTVOY3T5quczMiNhE1aLdxvZA3bP+G+C3gF8BcxveOr3h+bSGeaH6EWgM5sFu\npOF07fBaY2v+1yMsr3Hdg/bYoZ5GTwHfysw3NE6su7G2jfCekQwAXwY+CiwCvjjCfGfueOA7IraO\ntJ76h+QfgP+SmX0RsQIgM/+93qs6EvhYRCzPzM9ExH/dcVqT9asD2fJXqzwdpBHxTKpug4eAb1IF\nGsACto8Iuoiqz3puRBxWT/vdiJgTEdOAw6gOag4ud83gcuo+622Z+UuqoG0MaYC1VC17qMJsuEbO\n03XV3S3319O3RcReETGTqlU96NX1vC8B1lEdmzgoIp5dTz8nIvYDEhgcZbOI4R3Y8DlfSTVaaQtw\nJ1U32WdHeN9wvgUcHhHT6mXe1PBaD9BfB//+VKOFZkTEW6l+EG4GzgJeERHH7zhtF2pQBzL81Sq/\nV3dJfI2q5freOtDOpupO+Dfg7cDZEXEc8NPMvA/4IHAxVcv7AeBjVN0md2fm99neav8XYHpErKLq\nDx/s3rgDuCEiXtxQyxeB34qIO4HDgf+spzfuAVwEvLyu62NsHznzj1TdKFey/YcK4JcR8QXgWuCv\nMvNXwGnAlyLiLuCZdXfUNcCrIuKrwIsY/hjCd6iOdXwduCczf9DwGbdl5o+Gec+wxyIyc11d013A\nTWw/dkFm/gy4LSK+QRXo5wMXAD8CLq0/+4frz/xgPe22hmmawrq8pLMmg1aOTa+PBSzMzJvq0UZf\nzcyDx7C8s4G+zLxsrLWNsp5zgP+bmVeP53oksM9fk0urWiIbgOMj4oNU3UanjTRjRGwDfsj2g8Jd\nwI8z8+gW1dKUiPgisAn4yESuV+Wy5a+i1QdMn1932UjFsOWv0nUxwvDOiAjgCqrhkd3AhzPzn+vX\ntgF/DZxENbzzxcBlVAe6NwNL6yGk0qTkAV9pZB8Hbq6PF5wMXFmfu/C0zBw80PyvwFX1WbrvBr5Q\nj+aRJiVb/hLcHhGNJ4LdlZmn1JefGAzwe4C92D6EFbaPxz8I6M3MqwAyc3VE9AHzgLvHvXppNxj+\nEswfrs8/Io4G/mc9ln/w4Fhja/5n9eMsYO+I+I/67y6qMfbPGqd6pTEz/KVh+vzri8NdDxyXmV+J\niD2pzkgeboTEI8AvxjKcVJpo9klKw9ub6qJogwdtT6O6pEPPjjPWJ1o9FBGLASLi2RFxXUQ8Y6KK\nlXaV4a/SjXTm7C+ozoj9TkR8i+oM2M8DX6wv/bDj+/4YeF9EfJ/qgm5frc8Clialpsb5R8SJVKfh\n91Od+n0f1Snl06juKLQkM/vr+U6luqjW5Zm5fLwKlyTtvlHDv75I12qqi1z1UF14ag/gi/Xp838L\n/ITqx+DbVBeE2kJ14azXZObPx698SdLuaOaA7+uodmE3UZ1+fkpE/Ag4pX59BXAG1UW51mTmRoCI\nuJvqolq3tLxqSdKYNBP+v001jO0LVEPaPgLMzMz++vXHqcY+zwH6Gt7Xh3cDkqRJqZnw7wKeCfwR\n1Q/B1xg6NG6kOx+NNF2S1GbNhP9jwNczcxvwo4jYAPRHxIzMfIrqTkwPU411bmzpz6U6VjCiLVu2\nDnR3D3cDJUlSiwzbEG8m/FcCn46I86n2APahuuXccVR3HFpc/70GuCIi9qW6u9I8tt8gY1jr129q\ntviO1tvbQ1/fhnaXoRZwW04dpWzL3t7fODUFaGKcf32j6c8B/4fq4O17qe7OdFJE3AHMBq7OzM3A\nMqofi5XAOZk59b9ZSepAbb2ef1/fhiJuJlBKC6MEbsupo5Rt2dvbM2y3j2f4SlKBDH9JKpDhL0kF\nMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDD\nX5IKZPhLUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKlB3uwuY6k69\n+C42bOqnZ+YeXPSB17S7HEkCmgj/iJgP3ADcD3QB3wM+DlxLtefwKLAkM/sj4kTgVGArcHlmLh+v\nwjvFhk39Qx4laTJottvn9sxclJkLM/NU4FzgksycD6wFlkbETOAsYBGwEDg9ImaNS9UdpGfmHkMe\nJWkyaLbbp2uHvxcAp9TPVwBnAA8AazJzI0BE3A0cDtwy9jI710UfeA29vT309W1odymS9LRmw//g\niPg88EyqVv/MzBzsx3gc2A+YA/Q1vKevni5JmmSa6fZ5EDgnM98M/ClwJUN/NHbcKxhtuiSpzUZt\n+WfmI1QHfMnMH0XE/wNeEREzMvMpYC7wMPAIQ1v6c4HVO1v27Nkz6e6evru1d5Te3p52l6AWcVtO\nHSVvy2ZG+7wN2C8zPxERz6Xq3vk0cBzwWWAx8GVgDXBFROwLbAPmUY38GdH69ZvGVn2HsM9/6nBb\nTh2lbMuRfuCa6fO/GbguIo4F9qA60Ptd4JqIeBewDrg6M7dGxDJgJVX4n5OZU/+blaQO1DUwMNC2\nlff1bWjfyidQKS2MErgtp45StmVvb8+wx1+9vIMkFcjwl6QCGf6SVCDDX5IKZPhLUoEMf0kqkOEv\nSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJU\nIMNfkgpk+EtSgQx/SSq
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fcdb0f0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Fare', by = 'Pclass', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that most outliers are in class 1. In particular, we see some values higher thatn 500 that should be an error."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 512.3292 NaN C \n",
"679 male 36.0 0 1 PC 17755 512.3292 B51 B53 B55 C \n",
"737 male 35.0 0 0 PC 17755 512.3292 B101 C "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df.Fare > 400]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can replace this value by the median(), the mean(), or the second highest value."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>89</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Mabel Helen</td>\n",
" <td>female</td>\n",
" <td>23.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Charles Alexander</td>\n",
" <td>male</td>\n",
" <td>19.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>341</th>\n",
" <td>342</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Alice Elizabeth</td>\n",
" <td>female</td>\n",
" <td>24.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>438</th>\n",
" <td>439</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Mark</td>\n",
" <td>male</td>\n",
" <td>64.0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>311</th>\n",
" <td>312</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ryerson, Miss. Emily Borie</td>\n",
" <td>female</td>\n",
" <td>18.0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PC 17608</td>\n",
" <td>262.3750</td>\n",
" <td>B57 B59 B63 B66</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"88 89 1 1 Fortune, Miss. Mabel Helen \n",
"27 28 0 1 Fortune, Mr. Charles Alexander \n",
"341 342 1 1 Fortune, Miss. Alice Elizabeth \n",
"438 439 0 1 Fortune, Mr. Mark \n",
"311 312 1 1 Ryerson, Miss. Emily Borie \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 512.3292 NaN C \n",
"737 male 35.0 0 0 PC 17755 512.3292 B101 C \n",
"679 male 36.0 0 1 PC 17755 512.3292 B51 B53 B55 C \n",
"88 female 23.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"27 male 19.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"341 female 24.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"438 male 64.0 1 4 19950 263.0000 C23 C25 C27 S \n",
"311 female 18.0 2 2 PC 17608 262.3750 B57 B59 B63 B66 C "
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Calculate hight values\n",
"df.sort_values('Fare', ascending=False).head(8)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>89</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Mabel Helen</td>\n",
" <td>female</td>\n",
" <td>23.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Charles Alexander</td>\n",
" <td>male</td>\n",
" <td>19.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>341</th>\n",
" <td>342</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Alice Elizabeth</td>\n",
" <td>female</td>\n",
" <td>24.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>438</th>\n",
" <td>439</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Mark</td>\n",
" <td>male</td>\n",
" <td>64.0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>311</th>\n",
" <td>312</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ryerson, Miss. Emily Borie</td>\n",
" <td>female</td>\n",
" <td>18.0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PC 17608</td>\n",
" <td>262.375</td>\n",
" <td>B57 B59 B63 B66</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"88 89 1 1 Fortune, Miss. Mabel Helen \n",
"27 28 0 1 Fortune, Mr. Charles Alexander \n",
"341 342 1 1 Fortune, Miss. Alice Elizabeth \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"438 439 0 1 Fortune, Mr. Mark \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"311 312 1 1 Ryerson, Miss. Emily Borie \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 263.000 NaN C \n",
"88 female 23.0 3 2 19950 263.000 C23 C25 C27 S \n",
"27 male 19.0 3 2 19950 263.000 C23 C25 C27 S \n",
"341 female 24.0 3 2 19950 263.000 C23 C25 C27 S \n",
"737 male 35.0 0 0 PC 17755 263.000 B101 C \n",
"438 male 64.0 1 4 19950 263.000 C23 C25 C27 S \n",
"679 male 36.0 0 1 PC 17755 263.000 B51 B53 B55 C \n",
"311 female 18.0 2 2 PC 17608 262.375 B57 B59 B63 B66 C "
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Replace\n",
"df.loc[df.Fare > 400, 'Fare'] = 263.0\n",
"\n",
"# Check we have removed outliers\n",
"df.sort_values('Fare', ascending=False).head(8)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([('Fare',\n",
" <matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fcb91d0>)])"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEdCAYAAADkeGc2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZ9JREFUeJzt3X+cXHV97/HXJgvBQChRtwEj5t6i/Sj1ehUVvMGaHyjI\nbQGvoVqlSAlUrGLBChp6iwQfvd5cVB4ULJUC4ZeigkZKFCViQEV5GNRipZUPFBXlx4XlGjQBgkk2\n949zNpnE3exkZ2Znd7+v5+ORx8ycOXPms3Meec93vud7vqdny5YtSJLKMqXbBUiSxp7hL0kFMvwl\nqUCGvyQVyPCXpAIZ/pJUoN5uF6CJLyLmAD8CvlcvmgacmZnfacO2rwCuz8ybhnn+D4EfZ+bjrb5X\nu0XEnsDdmfmfG5bNAT6fma/exW1dAbwSeByYCjwCnJSZTw6x7qjeQ2Wx5a92uSczF2bmQmAJ8KEx\net/FwKwxeq9d1QMMdSLNaE+uWVJ/xvOAfwdO28m6nsCjnbLlr3bpabi/L/AgQETMBpYDuwObgZOA\nF1D9Mjg6Il4LnAW8G7geuBf4fWBNZp46uMGI6AX+Cfi9elvnUAXcm4ADI2JRZj7YsO6n6ve5A3hL\nZu4fEbcCd9ev+1vgSmAfqv8Hf5WZd0VEf2b21du5HrgIWAA8v97evnXtqyLifwDvBzYC38vMMyNi\nBvAFql8/3x7ms9o9Iq4GAvgB1ZflmsyM+n3fDhyUmWfs5PP+LvCn9fofABbVn+9ZwM8aPre3A+8F\nNgH/lpnvioj9689nU/23/1m9+nbLMvMXO3l/TXC2/NUuERGrI+IO4GP1P4APA5dl5gLgH4FzM/Nb\nwOMR8XrgfwHvqdd9GfCBzDwYeHVEvKxh+28Dns7M+VRB94nMvAW4C/jzweCvvRGYlplzgdXAfg3P\n/Sgz/4qq1XxH/UvlfcAF9fPDtZifl5lHAMcBy+ounb8FFtR/2wsiYi5VkP6obp3fNcy2DqRqxR8C\nHATsD/wwIl5TP38McO0wrx30R8CaiHgh8OZ6W8fX9TWaDhyRmX8IvCQi/gA4FliVmYfVn8N+wyzT\nJGb4q10Gu33+G3A4cF1ETAVeBdxWr3Mr8PL6/geoWvKrMvNn9bJ7M/Ph+v53qVrGg2G8dTuZ+Qiw\nISJm1s81/uoAeAnbWt03UbVmB60ZYnvfBw4YZluDvl6vezfwPOAPqH4J3Fz/onhh/fhAYPBYx22/\nvRkA7mv4O79H9UvnauBtEbEb8J8y8wdDvO5/11+wtwEbgEuBV1B9VmTm/Zn5zh1esxa4sX7Ni4Hn\nAKuAEyLio8AembmmXvaOHZZpErPbR22XmRkRT1G1aAfYFqi7148Bfgd4Gpjd8NKpDfenNKwL1ZdA\nYzAPdiMNpWeH5xpb878ZZnuN7z1otx3qafQM8P3MfGPjwroba2CY1wxnC/BV4O+AhcCXhlnvrB0P\nfEfE5uHep/4i+Qfgv2Rmf0SsBMjMf6t/VR0OfCQilmfmpyLiv+64rMn6NQHZ8le7bA3SiHg2VbfB\ng8CdVIEGMJ9tI4L+nqrPenZEHFIv+72ImBURU4BDqA5qDm53zeB26j7rgcz8NVXQNoY0wP1ULXuo\nwmyoRs7Wuurulrvr5QMRsUdETKdqVQ96bb3uy4AHqI5NvDginlsvXxoR+wEJDI6yWcjQDmj4O19N\nNVppE/BNqm6yTw/zuqF8Hzg0IqbU21zR8NwMYGMd/PtTjRaaFhFvpfpCuBE4G3hVRLxlx2W7UIMm\nIMNf7fL7dZfErVQt1/fUgXYOVXfC14F3AOdExLHALzLzR8CZwIVULe97gY9QdZvcnpk/Zlur/XPA\n1IhYTdUfPti98Q3g+oh4SUMtXwJ+JyK+CRwK/L96eeMvgL8HXlnX9RG2jZz5R6pulMvZ9kUF8OuI\n+GfgGuCDmfk0cDrwlYj4FvDsujvqauA1EfE14EUMfQzhLqpjHd8Bvp2Z9zT8jQOZ+ZMhXjPksYjM\nfKCu6VvACrYduyAzfwncEhHfpQr084DzgZ8An6j/9g/Vf/N99bJbGpZpEutxSmeNB+0cm14fC1iQ\nmSvq0UZfy8wDW9jeOUB/Zl7cam0jvM9S4KeZeVUn30cC+/w1vrSrJbIOeEtEnEnVbXT6cCtGxADw\nH2w7KNwD/Cwzj2xTLU2JiC8BTwHnjuX7qly2/FW0+oDp8+suG6kYtvxVuh6GGd4ZEQFcRjU8shf4\nUGZ+tn5uAPgb4ASq4Z0vAS6mOtC9AVhcDyGVxiUP+ErD+yhwY3284CTg8vrcha0yc/BA8xeBK+uz\ndN8F/HM9mkcal2z5S3BbRDSeCPatzDylnn5iMMC/DezBtiGssG08/ouBvsy8EiAz74iIfmAucHvH\nq5dGwfCXYN5Qff4RcSTwP+ux/IMHxxpb87+sb/cB9oyIf68f91CNsX9Oh+qVWmb4S0P0+deTw10H\nHJuZN0fE7lRnJA81QuJh4FetDCeVxpp9ktLQ9qSaFG3woO3pVFM6zNhxxfpEqwcjYhFARDw3Iq6N\niGeNVbHSrjL8Vbrhzpz9FdUZsXdFxPepzoC9AfhSPfXDjq/7U+DUiPgx1YRuX6vPApbGpRHH+det\nlyupLpgxjWryqR9SnVI+heqKQsdn5saIOI7qNPnNwKWZubxzpUuSRquZlv9RwJ31POpvpZob5MNU\n86nPo5pEa3HdGjqbajKrBcD7ImKfjlQtSWrJiAd8M/O6hocvAH4BzANOqZetBM6gmpRrTWauB4iI\n26km1fpyOwuWJLWu6dE+EfFtqrnXj6Lqz9xYP/UY1djnWUB/w0v68WpAkjQuNX3ANzMPBY6mmmu8\ncWjccFc+Gm65JKnLRmz5R8RBwGOZ+WBm/mt9evu6iJiWmc9Q/Rp4iGqsc2NLfzbVxbOHtWnT5i29\nvUNdQEmS1CZDNsSb6fZ5HTCH6gDuLGAv4CtUF3z+NNXFtL9KdaWlyyJib6qrK81l2wUyhrR27VPN\nFj+h9fXNoL9/XbfLUBu4LyePUvZlX99vnZoCNNft80ngd+urIq0E/pLq6kwnRMQ3gJnAVZm5AVhC\ndSHoVcDSzJz8n6wkTUBdnc+/v39dERcTKKWFUQL35eRRyr7s65sxZLePZ/hKUoEMf0kqkOEvSQUy\n/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUIMNf\nkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFai32wVMdouXrd56f/mShV2sRJK2seUvSQUy/CWpQIZ/\nh82Ztdd2t5I0Htjn32HnnHgwfX0z6O9f1+1SJGkrW/6SVKCmWv4RcR7wWmAqsAw4Gngl8Hi9ykcz\n8ysRcRxwGrAZuDQzl7e/ZElSq0YM/4iYDxyYmXMj4tnAvwBfB5Zk5k0N600HzgZeBWwC7oyIFZn5\nREcqlySNWjPdPt8A/qS+/wSwJ9UvgJ4d1jsEWJOZ6zNzA3A7cGi7CpUktc+ILf/M3AI8XT88Gfgy\nVbfOqRHx18CjwHuBfYH+hpf2A/u1tVpJUls0fcA3Io4BTgROBa4BPpiZhwF3AUuHeMmOvwwkSeNE\nswd8jwDOAo7IzHXArQ1PrwQuBq4HjmpYPhu4Y2fbnTlzOr29U3ep4Imqr29Gt0tQm7gvJ4+S92Uz\nB3z3Bs4DDsvMX9XLPg+cmZk/BeYDdwNrgMvq9QeAuVQjf4a1du1TLRU/EZx7xRoeeHQ9c2btxTkn\nHtztctQiz9mYPErZl8N9wTXT8n8r8BzguojoAbYAVwCfi4gngfXAiZm5ISKWAKuown9p/SuhaA88\nun67W0kaD5o54HspcOkQT10zxLorgBVtqGvSmDNrr60tf0kaL5zeocOc3kHSeOT0DpJUIMNfkgpk\n+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDDX5IK5PQOHXbKx25j46YBduudwiVnzO92\nOZIE2PLvuI2bBra7laT
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fcb4d30>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Fare', by='Pclass', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Embarked"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can analyze the distribution based on the port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton). "
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Embarked\n",
"C 168\n",
"Q 77\n",
"S 644\n",
"dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Embarked').size()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fbdad30>"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFI1JREFUeJzt3XuQnXV9x/H3uiuRbBKzwBpjEMTbl0FrZywWm1gCoUSt\nd4NajRnGqMVbG/FKq0BEtF4Ko+LYaiyKGTpTLxlrhpEGRJQAGrzfv1pUaBLaLHo0NxI3m+0fzy9w\nNm52T5Z9zlmy79dMhuf8nuc857t7hv08v9/vuXQNDw8jSdKDOl2AJGlqMBAkSYCBIEkqDARJEmAg\nSJIKA0GSBEBPnTuPiJXACmAY6AL+DHga8C/AfuAHmfn6su1bgXNK+yWZ+eU6a5MkjdTVrusQIuJ0\n4EXAE4C3ZOZ3IuJq4DNAAp8Dngr0ATcBp2SmF0lIUpu0c8joIuD9wKMy8zulbT1wNnAm8OXMHMrM\nu4FfA6e0sTZJmvbaEggRcSpwJzAENJpWbQPmA/OAgab2gdIuSWqTdvUQXgV8uix3NbV3/fGmY7ZL\nkmpS66RykzOAN5TlY5vaFwBbgK3AyQe1bx1rh/v2DQ339HRPYomSNC0c8oC79kCIiPnAjszcV17/\nNCIWZuYtwAuBjwC/AN4UERcBDwMekZk/GWu/jcbumiuXpCNPf//sQ65rRw9hPtVcwQHnAx+PiC7g\nm5l5A0BErKE6u2g/8Jo21CVJatK2004n28DAjgdm4ZLUQf39sw85ZOSVypIkwECQJBUGgiQJMBAk\nSYWBIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFQaCJAkwECRJRbueh9AxQ0NDbN58Z6fLmBaOP/4E\nurt9RoX0QHXEB8LmzXfyzg99nofMOqbTpRzR9uz8LZe+8RxOPPGkTpciaYKO+EAAeMisY5g5p7/T\nZUjSlOYcgiQJMBAkSYWBIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFbVfmBYRy4G3AoPARcAPgbVU\nYXQXsCIzB8t2q4AhYE1mXll3bZKk+9TaQ4iIY6hCYCHwbOD5wCXAFZm5GLgdWBkRM4ELgSXAmcD5\nETG3ztokSSPV3UP4K+C6zNwN7AbOi4hfAueV9euBtwA/BzZl5k6AiNgILAKuqbk+SVJRdyA8CuiN\niP8E5gLvAmZm5mBZvw2YD8wDBpreN1DaJUltUncgdAHHAC+gCoevlrbm9Yd635j6+mbS0zP+rZa3\nb+8ddxtNjr6+Xvr7Z3e6DEkTVHcg/B9wS2buB34ZETuAwYiYkZl7gQXAFmArI3sEC4Bbx9pxo7G7\npQIajV0TqVsT0GjsYmBgR6fLkDSGsQ7a6j7tdAOwJCK6IuJYYBZwPXBOWb8MuBbYBJwaEXMiYhbV\nJPRNNdcmSWpSayBk5lbg88A3qCaIXw9cDJwbEV8D+oCrMnMPcAFVgGwAVmemh5qS1Ea1X4eQmWuA\nNQc1Lx1lu3XAurrrkSSNziuVJUmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCB\nIEkqDARJEmAgSJIKA0GSBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJA\nkCQB0FPnziNiMfA54EdAF/AD4IPAWqowugtYkZmDEbEcWAUMAWsy88o6a5MkjdSOHsKNmbkkM8/M\nzFXAJcAVmbkYuB1YGREzgQuBJcCZwPkRMbcNtUmSinYEQtdBr88A1pfl9cDZwGnApszcmZl7gI3A\nojbUJkkqah0yKk6JiC8Cx1D1DmZm5mBZtw2YD8wDBpreM1DaJUltUncg/AJYnZmfi4hHA1896DMP\n7j2M136vvr6Z9PR0j1vA9u29rdSpSdDX10t//+xOlyFpgmoNhMzcSjWpTGb+MiL+Fzg1ImZk5l5g\nAbAF2MrIHsEC4Nax9t1o7G6phkZj1wQq10Q0GrsYGNjR6TIkjWGsg7Za5xAi4mUR8eay/HCqoaFP\nAeeUTZYB1wKbqIJiTkTMAhYCN9VZmyRppLqHjL4E/HtEPA94MHAe8H3gMxHxt8AdwFWZORQRFwAb\ngP1Uw0weakpSG9U9ZLQTeO4oq5aOsu06YF2d9UiSDs0rlSVJgIEgSSoMBEkSYCBIkgoDQZIEGAiS\npMJAkCQBBoIkqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCBIEkqDARJ\nEmAgSJIKA0GSBBgIkqTCQJAkAdBT9wdExEOAHwGXADcAa6mC6C5gRWYORsRyYBUwBKzJzCvrrkuS\nNFI7eggXAr8py5cAV2TmYuB2YGVEzCzbLAHOBM6PiLltqEuS1KTWQIiIAE4GrgG6gMXA+rJ6PXA2\ncBqwKTN3ZuYeYCOwqM66JEl/rO4ewmXAm6jCAKA3MwfL8jZgPjAPGGh6z0BplyS1UW1zCBGxArgl\nM++oOgp/pGu0xjHaR+jrm0lPT/e4223f3tvK7jQJ+vp66e+f3ekyJE1QnZPKzwJOiojnAAuAPwA7\nI2JGZu4tbVuArYzsESwAbh1v543G7paKaDR2HWbZmqhGYxcDAzs6XYakMYx10FZbIGTm3xxYjoiL\ngF8DC4FzgKuBZcC1wCbgkxExB9hftllVV12SpNG16zqEA8NAFwPnRsTXgD7gqjKRfAGwofxbnZke\nZkpSm9V+HQJAZr6r6eXSUdavA9a1oxZJ0uha6iFExKdHafuvSa9GktQxY/YQyhXErwGeGBFfb1p1\nFNXpopKkI8SYgZCZV0fEjVSTwBc3rdoP/LjGuiRJbTbuHEJmbgHOiIiHAsdw3wTxXOC3NdYmSWqj\nliaVI+LDwEqqq4gPBMIw8Oia6pIktVmrZxktAfrLKaKSpCNQq9ch/MIwkKQjW6s9hM3lLKONwL4D\njZl5US1VSZLartVA+A3wlToLkSR1VquB8O5aq5AkdVyrgbCP6qyiA4aB3wPHTnpFkqSOaCkQMvPe\nyeeIOAo4C/jTuoqSJLXfYd/tNDP/kJlfpnr8pSTpCNHqhWkrD2p6JNWDbCRJR4hW5xD+sml5GNgO\nvHjyy5EkdUqrcwivAIiIY4DhzGzUWpUkqe1aHTJaCKwFZgNdEfEb4OWZ+a06i5MktU+rk8rvA56X\nmQ/LzH7gpcDl9ZUlSWq3VgNhKDN/dOBFZn6XpltYSJIe+FqdVN4fEcuA68rrZwBD9ZQkSeqEVgPh\nNcAVwCepnpb2PeDVdRUlSWq/VoeMlgJ7M7MvM4+lekjOX9dXliSp3VoNhJcDL2x6vRR42eSXI0nq\nlFaHjLozs3nOYJj7HqV5SBFxNPBpYB4wA7gU+D7VKawPAu4CVmTmYEQsB1ZRzU2sycwrW/0hJEn3\nX6uB8KWIuAW4ieoP+VnAF1p433OA2zLznyPiBKpJ6ZuBj2bmFyLiPcDKiFgLXAicSnX20m0RsS4z\nf3eYP48kaYJavVL50oi4ETiNqnfwusz8Rgvv+2zTyxOA/wEWA+eVtvXAW4CfA5sycydARGwEFgHX\ntPZjSJLur1Z7CGTmRqpHaB62iLiZ6mZ4zwGuy8zBsmobMJ9qSGmg6S0DpV2S1CYtB8L9kZmLIuJJ\nwNWMnHs41DzEuPMTfX0z6enpHvezt2/vbalG3X99fb3098/udBmSJqjWQIiIJwPbMnNzZv4gIrqB\nHRExIzP3UvUatgBbGdkjWADcOta+G43dLdXQaOyaUO06fI3GLgYGdnS6DEljGOug7bAfkHOYTgfe\nDBAR84BZwPXAOWX9MuBaYBNwakTMiYhZwEKqCWxJUpvUHQj/CjwsIr5ONYH8WuBi4NyI+BrQB1yV\nmXuAC4AN5d/qzPRQU5LaqNYho/KHfvkoq5aOsu06YF2d9UiSDq3uHoIk6QHCQJAkAQaCJKkwECRJ\ngIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJAkCQBBoIkqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKk\nwkCQJAEGgiSpMBAkSYC
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fb5d198>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Embarked', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since there are missing values, we will replace them by the most popular value ('S'), and we will also encode it since it is a categorical variable.\n",
"\n",
"We can see if this has impact on its survival."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Embarked\n",
"C 0.553571\n",
"Q 0.389610\n",
"S 0.336957\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Embarked']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fb017b8>"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGIxJREFUeJzt3X+UX3V95/HnJJNAfkEGGI0RgsqPd1xW9ETBGCAKEUqr\nFVG0dLWLsrjnKFbpuuqKukUFrOUQIe22CttI/bHqViw/FpMmFagYQiupWlTydtsKIQlbEhhIMklg\nMpn94/ud5DvjJHNnMvf7neE+H+fk5Hs/937vfU8uzOv+/Hza+vr6kCRV06RWFyBJah1DQJIqzBCQ\npAozBCSpwgwBSaowQ0CSKqy97A1ExFJgIbAXuCIzH6y3zwW+AfQBbcDLgI9n5rfKrkmSVFNqCETE\nYuDEzFwUEfOB5cAigMzcDJxdX24ycA9wR5n1SJIGKvty0BLgNoDMXA/MjoiZQyz3HuDWzNxZcj2S\npAZlh8AcYEvD9NZ622CXAX9Rci2SpEGafWO4bXBDRCwEHs7MHU2uRZIqr+wbw5sZeOQ/F3h80DJv\nBv62yMr27Onta2+fPEalSVJl/NoBeL+yQ2AVcBVwc0QsADZlZvegZU4DvllkZV1d3jKQpJHq7Jx1\nwHmlXg7KzLXAuohYA9wAXB4Rl0TEBQ2LzQGeKLMOSdLQ2iZSV9JbtmyfOMVK0jjR2TnrgJeDfGNY\nkirMEJCkCjMEJKnCDAFJqjBDQJIqzBCQpAozBCSpwgwBSaowQ0CSKswQkKQKMwQkqcIMAUmqMENA\nkirMEJCkCjMEJKnCDAFJqjBDQJIqzBCQxsDy5Tdx8cVvZfnym1pdijQihoB0iHbv3sXq1SsAWL16\nJbt372pxRVJxhoB0iHp6eugfq7uvby89PT0trkgqzhCQpAozBCSpwgwBSaqw9rI3EBFLgYXAXuCK\nzHywYd6xwDeBKcA/ZuYHyq5HkrRfqWcCEbEYODEzFwGXAcsGLXI9cF1mLgR666EgSWqSsi8HLQFu\nA8jM9cDsiJgJEBFtwJnAnfX5v5+ZG0uuR5LUoOwQmANsaZjeWm8D6AR2ADdExH0RcW3JtUiSBin9\nnsAgbYM+vxj4IrABuCsifjMzVxzoyx0d02lvn1xyidLITJ26d8D00UfP5MgjZ7WoGmlkyg6Bzew/\n8geYCzxe/7wVeCQzHwGIiO8DpwAHDIGurp3lVCkdgu3bdwyYfvLJHTz3nA/eafzo7DzwQUnZ/6Wu\nAi4CiIgFwKbM7AbIzF7gXyPihPqyrway5HokSQ1KPRPIzLURsS4i1gC9wOURcQnwdGbeDvwBcEv9\nJvFDmXlnmfVIkgYq/Z5AZl45qOmhhnn/ApxVdg2SpKF54VKSKswQkKQKMwQkqcIMAUmqMENAkirM\nEJCkCjMEJKnCDAFJqjBDQJIqrNm9iErD6u3tZePGDa0uo7Du7u4B0489toEZM2a0qJrijj12HpMn\n2ytv1RkCGnc2btzAVd+9hmkd4/8XKcDe53oHTC/74ZeYNHV8/3Ld1dXNVW/7JMcf/9JWl6IWMwQ0\nLk3rmMGMYyZGn/y9z+7h6Ybp6UfPZPJh/q+licF7ApJUYYaAJFWYISBJFWYISFKFGQKSVGGGgCRV\nmCEgSRVmCEhShRkCklRhhoAkVZghIEkVVnoHJxGxFFgI7AWuyMwHG+b9CthQn9cHvCszHy+7JklS\nTakhEBGLgRMzc1FEzAeWA4saFukDzs/MXWXWIUkaWtmXg5YAtwFk5npgdkTMbJjfVv8jSWqBskNg\nDrClYXprva3RlyLivoi4tuRaJEmDNLvT88FH/Z8GVgJPAbdHxNsy87sH+nJHx3Ta28f3YB06dNu2\nTYzBZCa6jo4ZdHZOjDEbVJ6yQ2AzA4/85wL7bvxm5tf7P0fE94BXAAcMga6unSWUqPGmq6t7+IV0\nyLq6utmyZXury2i55ctvYtWq73Heeb/FpZf+51aXU4qDhX3Zl4NWARcBRMQCYFNmdtenj4iIlREx\npb7s64GflVyPJO2ze/cuVq9eAcDq1SvZvbt6z6iUGgKZuRZYFxFrgBuAyyPikoi4IDO3AXcBD0TE\nfcATmXlrmfVIUqOenh76+voA6OvbS09PT4srar7S7wlk5pWDmh5qmPcnwJ+UXYMkaWi+MSxJFWYI\nSIeobVLDQ29tg6alcc4QkA7RpCmTmXnyUQDMPOkoJk3xMWZNHM1+T0B6Xuo4fS4dp89tdRnSiHkm\nIEkVZghIUoUZApJUYYXvCUTEC4Hj65OPZua/lVOSJKlZhg2BiHgn8AngRcBj9eZ5EbEJ+Hxm/lWJ\n9UmSSnTQEIiIW+rLvCczfzpo3iuBj0bEmzLzPaVVKEkqzXBnAn+dmbcPNaMeCu+OiAvGvixJUjMM\nFwKvqh/xDykzP3ugkJAkjX/DhUD//JPqf34ATKbW7fOPS6xLktQEBw2BzPw0QETcAZyemb316SnA\nt8svT5JUpqLvCcxj4NCQfex/XFSSNEEVfU/gLuCXEbEO2AssAG4rrSpJUlMUCoHM/GT9cdFXUDsj\n+Exm/qLMwiRJ5St0OSgiDgPOo3Zf4FZgVkQcXmplkqTSFb0n8GfACcDZ9ekFwC1lFCRJap6iITA/\nM/8LsBMgM/8csPN0SZrgiobAnvrffQARMQOYVkpFkqSmKRoCfxUR3wdeFhHLgJ8A3yivLElSMxR9\nOuhPI+LvgTcAzwIXZ+a6MguTJJWvUAhExAPAV4G/yMynRrKBiFgKLKT2fsEVmfngEMt8HliYmWcP\nnidJKk/Ry0EfAeYDP46I2yPiooiYOtyXImIxcGJmLgIuA5YNsczLgbOo32+QJDVPoRDIzDWZ+SHg\nJcAXgfOBTQW+uoT6m8WZuR6YHREzBy1zPXBl0YIlSWNnJMNLzgbeCrwDeBnw5QJfmwM0Xv7ZWm/7\n5/o6LwHuAR4tWockaewUvSfwN8Ap1I7qr8nM+0e5vX2d0EVEB/BeamcLxzGwg7ohdXRMp7198ig3\nrYli27YZrS6hEjo6ZtDZOavVZbTU1Kl7B0wfffRMjjyyWv8mRc8EbgRWZubeYZccaDO1I/9+c4HH\n65/PAY4B7gMOp/b46fWZ+ZEDrayra+cIN6+JqKuru9UlVEJXVzdbtmxvdRkttX37jgHTTz65g+ee\nK3qrdOI4WNgPN8bwjZn5YWoDzf+3iBgwPzMXD7PtVcBVwM0RsQDYlJnd9e/eCtxa387xwFcOFgCS\npLE33JnA8vrfnxrNyjNzbUSsi4g1QC9wef0+wNMOSyk9P/X29rJx44ZWl1FId/fAs87HHtvAjBkT\n43LkscfOY/LkQ788PtzIYj+tf/wCtfcEvjXS9wQyc/CTPw8Nscyj1C4PSZrgNm7cwN1/+EmOmTb+\ne5Z5du/AK9y/uPF6Dps0/i8Hbd21i3M+cw3HH//SQ15X0XsCHwF+h9p7Aj8BvgbckZnPHXIFkp53\njpk2jTnTx/8R9a7eXnima9/0C6ZNZ9oYHF1PJGW/JyBJGscKn/fU3xO4BPgocAbF3hNQQcuX38TF\nF7+V5ctvanUpkiqk6MhifwP8DHg1tfcEXp6Zo7pZrF+3e/cuVq9eAcDq1SvZvXtXiyuSVBVF7wn8\nHfBbmdlbZjFV1dPTQ19freukvr699PT0cPjh4/+mmqSJr+jloDcaAJL0/FP0TGBDRNwLPADseyIo\nM/97GUVJkpqjaAj8qv5HkvQ8UjQEPldqFZKkligaAnsYOOhLH/AMcPSYVyRJapqiYwzvu4FcH1Fs\nCfDKsoqSJDXHiDvJyMznMnMFcG4J9UiSmqjooDKXDmo6Dnjx2JcjSWqmovcEzmr43AdsA9459uVI\nkpqp6D2B9/Z/rvch9Exm9h3kKy01kfozB/s0l9Q6w40sdirw6cx8R336G8CFwDMRcUFm/kMTahyx\njRs38KkbvsPhM49qdSmF9PUO7JF76dfvo23y1BZVU9zuHU9x9RUXjUmf5pJaY7gzgWXA9QARsRh4\nHfBCavcElgFvLLW6Q3D
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fac9358>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x='Embarked', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems passengers embarked in C (Cherbourg) have a higher chance of survival.\n",
"We can analyse this by sex."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8faf1550>"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHjFJREFUeJzt3XucVXW9//HXXAUGkBmYMEBULn4w0/zhkYDwSuIlSzua\nmmUaaf28hifTRI95g7z8HC5HLcGDaZadR1amRyHG0lTUc5TMOBaf0hQYoB8DjDAzMMxtnz/2Htwz\nzWUBa+09e9b7+Xj4YK+19vruD6xx3nut71rfb14ikUBEROIpP9sFiIhI9igERERiTCEgIhJjCgER\nkRhTCIiIxJhCQEQkxgqj/gAz+zjwJFDh7g902PZpYA7QDCx19zuirkdERD4U6ZmAmQ0AFgLPdfGW\nBcDngWnADDObEGU9IiLSXtSXgxqA04CNHTeY2SHAFnff4O4J4FlgesT1iIhImkhDwN1b3X1XF5sP\nAKrTljcBH42yHhERaa83dQznZbsAEZG4ibxjuBsbaP/Nf2RqXZeam1sShYUFkRYlItIHdfklO5Mh\n0K4Id19jZoPMbDTJX/5nABd010BNzY4IyxMR6ZvKywd1uS3SEDCzicC9wEFAk5mdDTwFvOfuvwIu\nA34KJIDH3f2dKOsREZH28nJpKOnq6trcKVZEpJcoLx/U5eWg3tQxLCIiGaYQEBGJMYWAiEiMKQRE\nRGJMISAiEmMKARGRGMvmE8M5pbW1lXnz7qGmZgsFBYXU1tZy5ZWzGDNmbLZLExHZawqBgN59969s\n2vR37rprHgBVVeuoqlrH8uVLqa7eRHNzE2eddQ5jx47nhhu+RUXFfbz99iqWLXuG2bO/m+XqJVct\nWbKI5cufZcaM05k58+vZLkf6ID0sFlBzczO33noTAwYM4KijJnLkkUdRW1vLT37yKLfd9j127Wrg\nqqv+L4sW/ZDXX3+NV15ZwZo173HHHXczYMCAbJUtOayhYSdf/eoFJBIJ8vLyefjhH9OvX/9slyU5\nqLuHxXQmEFBhYSG3334n27dv4+23/4clSx7EfTXFxcXMnXsrAAUFycHtjjlmMg8++AAnnHCSAkD2\nWlNTE21f0hKJVpqamhQCEjqFQEBvvrmS7du3cfzxJzFlyqcYN248X/jC5zjllNO54YabAViz5n0A\nli9fyrHHHs/Kla9zyimnU17+kSxWLiLSNYVAQOPHGxUVd7F06X9SXLwfDQ07mTfvfl577RXmzLmF\nuro6PvnJKQwYMIBnnnmaefPuY9q047n77jncc8+CbJcvItIp9QmI9FK1tdu59NKv7F5evPhRBg0a\nnMWKJFdpADkREemUQkBEJMYUAiIiMaYQEBGJMYWAiEiMKQRERGKsTz0n0NLSQlXV2lDbHDVq9O4n\ngcM0d+6tnHjidKZMmRZ62yLSO+TC2E99KgSqqtZy0/wn6DewLJT2Guq2csesczjooENCaU9E4qOh\nYSeVlUsBqKxcxgUXXNgrh/3oUyEA0G9gGQMGl2f0M5cu/U/efHMl27Z9wPvvv8ell17Gc8/9mvff\nf5+bb76N3/ymktWr/0Rj4y7OPPNszjjjzN37tra2cvfdc9i4cQPNzc187WvfYOLEf8po/SISvlwZ\n+6nPhUC2rF9fxf33L+bpp5/kscce4eGHf8wzzzzFs88+zSGHjOGqq65h165dnHfeWe1CoLJyGcOG\nlfOd7/wr27Z9wNVXX8Yjjzyexb+J7I0oLkXW19e3W163bi0lJSWhtR/VpU7JLQqBkEyYcBgAQ4cO\nY+zYceTl5VFWNpTGxka2bdvGZZfNpLCwiG3bPmi336pVf2TVqj/wxz/+gUQiQVNTI83NzRQW6tDk\nkqqqtdzyizn0Lw3vl3RrY0u75YUv/4D84nB+ae+sqeeWf75RlzpFIRCW9G9U6a///veNbNiwnvvv\nf4j8/HxmzDi+3X5FRUV85SszmT59RsZqlWj0Ly2hZNig0Npr2dVM+leGAUMHUrCf/peVcOkW0Yit\nXv1nhg8fTn5+Pi+//DtaW1tobm7evf1jHzucF198AYCamq08+OD9WapUROKoz32taKjb2qvaOuaY\nSaxbt46rrvoGxx57AlOnHsu99965e/tJJ53M73//BpddNpPW1kSvvY1MRPqmPhUCo0aN5o5Z54Te\nZk9OO+2M3a+nTp3G1KnT/uF1m3PP/eI/7H/99TftY5UiInunT4VAQUGBOrpERPaA+gRERGJMISAi\nEmMKARGRGFMIiIjEmEJARCTG+tTdQdkYSrq5uZnLL7+Egw8+hNmzvxvKZ/797xu56abreeihR0Np\nT0SkK30qBMIevyXI+CqbN2+mubkptABok5cXanMiIp2KPATMrAKYDLQCs9z9jbRtVwBfApqBN9z9\nX/b188Iev6Un991Xwfr1Vcydeys7duygrq6WlpYWrrnm24wZM47zzjuLz372LF544beMHDkKs8N4\n/vnnOPDA0dx88+28885fqai4i6KiIvLy8rj99rvatf/WW2+yaNEDFBYWMXz4cK677kYNLicioYm0\nT8DMjgPGuftU4BJgYdq2QcC1wKfc/TjgcDObFGU9Ubjyyms48MCDGDlyFJMnT2X+/Af41re+w7/9\n2zwgOV/AhAkf46GHHmXVqrcYOXIkixc/wltvvUl9fR01NVu55prrWLDg+xxxxCdYvnxpu/YXLPh/\n3HlnBQsWPMCQIaU8//xz2fhrikgfFfVXyunAkwDuvtrMhpjZQHevAxqBXcBgM6sH+gPhDfyTYatW\nvcW2bR/w618/C0BjY+PubYcd9jEAysqGMm7coanXZdTV1VFWNpTvf38hDQ0NbNmymRkzTtu9X03N\nVtatW8eNN36bRCJBQ0MDQ4aUZvBvJSJ9XdQhcADwRtry5tS6d9x9l5ndBvwN2AH81N3fibieyBQV\nFTNr1nUcfvjH/2FbQUFhp68TieQ3/QsvvJhjjpnM448/RkPDzt3bCwuLKC8vZ+HCH0RbvIjEVqYv\nLu/u7kxdDpoNjANqgefN7Ah3X9XVzqWlAygs7PpOne3bw5vQ48PPLKG8vOs+hsbG7RQW5jNp0tG8\n8cYKTjhhCu+88w4vv/wyF198Mfn5eQwbNpD+/ftTWJjP0KHJ9goK8ikrG0B9fS0f/7ix//77sXLl\naxx11FGUlZVQWFjAmDEjKCwsYPv2TYwdO5bHHnuMSZMmceihh4b+95R9E8XPXtR6+tmWfVNc3Npu\neejQgey/f+/79446BDaQ/ObfZgSwMfX6MOBdd68BMLOXgKOBLkOgpmZHtx9WU1PPzpr6bt+zJ3bW\n1FNTU091dW2X79m6tZ6WllZOPfUs5sz5Lueeez6tra3MmvVtqqtraW2FzZvr6NevmZaWVrZsqaeo\nqJaWlla2bt3BmWeew9e//g1GjTqQM888h3nz7mHKlBNobm6hurqWa6+dzbXXXkdxcTFDhw5j+vTP\ndFuPZEdNiD93bfLy024Ry+uwHIKefrZl39TW1rVb3rKljsbG7Dya1V3Y57VNhBwFM5sC3OLup5jZ\nRGB+qhMYM/sI8DJwROrS0HLgVndf0VV71dW13RabjecERADWrHmPu56fH/qdaTX/vYG6v2xl4KFl\nlE4aEVq79Ztruf7EWRp1N0K1tdu59NKv7F5evPhRBg0anJVayssHdfkNItIzAXd/1cxWmtkKoAW4\nwswuAj5w91+Z2T3AC2bWBLzSXQAEoaGkpa8pnTQi1F/+Ih1F3ifg7rM7rFqVtm0xsDjqGkREpHMa\nO0hEJMYUAiIiMaYQEBGJMYWAiEiMKQRERGJMw1GKSKxF8XwRQH19+wcI161bS0lJeE+Wh/UMk0Kg\nj1uyZBHLlz/LjBmnM3Pm17NdjkivE/Y8JG1aG1vaLS98+QfkF4fz4GmQuU6CUgj0YQ0NO6msTA5N\nXVm5jAsuuJB+/fpnuSqR3ieKeUhadjXzQdrygKEDKdiv9/3KVZ9AH9bU1ETbsCCJRCtNTU1ZrkhE\nehuFgIhIjCkERERiTCE
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fabd320>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Embarked\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also an improvement by gender for passengers embarking in Cherbourg."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have to fill null values (2 null values) and encode this variable, since it is categorical. We will do it after reviewing the rest of features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Features SibSp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We analyse the distribution."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp\n",
"0 608\n",
"1 209\n",
"2 28\n",
"3 16\n",
"4 18\n",
"5 5\n",
"8 7\n",
"dtype: int64"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('SibSp').size()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fa57588>"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFbdJREFUeJzt3X+UX3V95/HnNCPBTBIziUMMyZJat/u21O3uupzFDSoQ\nJZSqy9ZgdcWsNbobtnoOpdJz6J7lR1ntuqzQrdrWntRU5OBZYY3WSIWIFgQJDa6t6LZ9i2BJJ8HN\nECZMSEpIJrN/3E/Cd4bM5Dth7vfOJM/HOTncn995DSf5vu7n3u+9366RkREkSfqppgNIkqYHC0GS\nBFgIkqTCQpAkARaCJKmwECRJAHTX+eIRsRZYA4wAXcC/BF4P/CFwCHg4Mz9Ytv1N4JKy/PrM/Fqd\n2SRJo3V16j6EiHgj8A7g54ErM/O7EXEr8DkggduB1wG9wH3AmZnpTRKS1CGdPGV0DfDfgZ/OzO+W\nZZuAC4Dzga9l5nBmPgn8HXBmB7NJ0kmvI4UQEWcB24BhYLBl1U5gCbAYGGhZPlCWS5I6pFMjhA8A\nny3TXS3Lu1646YTLJUk1qfWicovzgA+V6UUty5cC24EdwKvHLN8x0QsePDg80t09awojStJJYdwD\n7toLISKWAHsy82CZ/5uIWJGZDwBvBz4BPAL8RkRcA5wGnJ6Zfz3R6w4O7qs5uSSdePr65o27rhMj\nhCVU1woOuwL4o4joAv4iM78JEBHrqT5ddAi4rAO5JEktOvax06k2MLBnZgaXpAb19c0b95SRdypL\nkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIkwEKQJBUWgiQJsBAk\nSYWFIEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQKgu+4fEBGXAr8JHACuAb4P\n3EJVRk8AazLzQNnucmAYWJ+ZG+rOJkl6XtfIyEhtLx4RC4EtwL8A5gHXAy8BvpqZGyPio8A2qoL4\nLnAWcBB4CHhDZu4e77UHBvaMCj48PEx//7Zafo8Xa9myM5g1a1bTMSSJvr55XeOtq3uE8Gbg65m5\nD9gHrIuIx4B1Zf0m4Ergh8DWzHwGICLuB84B7mj3B/X3b2P7/7qdpQsWTmX+F2377qfgXe9g+fJX\nNh1FkiZUdyH8NNATEX8KLAB+G5iTmQfK+p3AEmAxMNCy30BZPilLFyxk+aK+FxVYkk5WdRdCF7AQ\n+GWqcvjzsqx1/Xj7Tai3dw7d3c+fhhka6mHwuGPWq7e3h76+eU3HkKQJ1V0I/w94IDMPAY9FxB7g\nQETMzsz9wFJgO7CD0SOCpVTXHsY1OLhvzPzeqcw9pQYH9zIwsKfpGJI04cFp3R873QysjIiuiFgE\nzAXuBi4p61cDdwJbgbMiYn5EzAVWAPfVnE2S1KLWQsjMHcD/Bh6kukD8QeBa4L0RcS/QC9ycmc8C\nV1EVyGbgusz0kFqSOqj2+xAycz2wfsziVUfZbiOwse48kqSj805lSRJgIUiSCgtBkgRYCJKkwkKQ\nJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSosBEkSYCFI\nkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEkAdNf54hFxLnA78AOgC3gY+B/ALVRl9ASwJjMPRMSlwOXA\nMLA+MzfUmU2SNFonRgj3ZObKzDw/My8Hrgc+mZnnAo8CayNiDnA1sBI4H7giIhZ0IJskqehEIXSN\nmT8P2FSmNwEXAGcDWzPzmcx8FrgfOKcD2SRJRa2njIozI+LLwEKq0cGczDxQ1u0ElgCLgYGWfQbK\ncklSh9RdCI8A12Xm7RHxM8Cfj/mZY0cPx1p+RG/vHLq7Zx2ZHxrqYfDFJK1Rb28PfX3zmo4hSROq\ntRAycwfVRWUy87GI+AlwVkTMzsz9wFJgO7CD0SOCpcCWiV57cHDfmPm9U5h8ag0O7mVgYE/TMSRp\nwoPTWq8hRMS7I+LDZfoVVKeG/gS4pGyyGrgT2EpVFPMjYi6wArivzmySpNHqPmX0FeDzEXEx8BJg\nHfA94HMR8R+Bx4GbM3M4Iq4CNgOHqE4zeUgtSR1U9ymjZ4B/c5RVq46y7UZgY515JEnj805lSRJg\nIUiSCgtBkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEkAdNf9AyLiVOAHwPXAN4FbqIro\nCWBNZh6IiEuBy4FhYH1mbqg7lyRptE6MEK4GdpXp64FPZua5wKPA2oiYU7ZZCZwPXBERCzqQS5LU\notZCiIgAXg3cAXQB5wKbyupNwAXA2cDWzHwmM58F7gfOqTOXJOmF6h4h3Aj8BlUZAPRk5oEyvRNY\nAiwGBlr2GSjLJUkdVNs1hIhYAzyQmY9XA4UX6DrawgmWj9LbO4fu7llH5oeGehicdMrO6O3toa9v\nXtMxJGlCdV5Ufgvwyoh4G7AUeA54JiJmZ+b+smw7sIPRI4KlwJZjvfjg4L4x83unKPbUGxzcy8DA\nnqZjSNKEB6e1FUJmvuvwdERcA/wdsAK4BLgVWA3cCWwF/jgi5gOHyjaX15VLknR0nboP4fBpoGuB\n90bEvUAvcHO5kHwVsLn8uS4zPZyWpA6r/T4EgMz87ZbZVUdZvxHY2IkskqSja2uEEBGfPcqyu6Y8\njSSpMROOEModxJcBr4mIb7WsOoXq46KSpBPEhIWQmbdGxD1UF4GvbVl1CPi/NeaSJHXYMa8hZOZ2\n4LyIeBmwkOcvEC8AnqoxmySpg9q6qBwRvwespbqL+HAhjAA/U1MuSVKHtfspo5VAX/mIqCTpBNTu\nfQiPWAaSdGJrd4TQXz5ldD9w8PDCzLymllSSpI5rtxB2Ad+oM4gkqVntFsJ/rTWFJKlx7RbCQapP\nFR02AjwNLJryRJKkRrRVCJl55OJzRJwCvAn4Z3WFkiR13qSfdpqZz2Xm16i+/lKSdIJo98a0tWMW\n/SOqL7KRJJ0g2r2G8IaW6RFgCPiVqY8jSWpKu9cQ3gcQEQuBkcycrl9fLEk6Tu2eMloB3ALMA7oi\nYhfwnsz8Tp3hJEmd0+5F5Y8BF2fmaZnZB/w74Kb6YkmSOq3dQhjOzB8cnsnMv6TlERaSpJmv3YvK\nhyJiNfD1Mv+LwHA9kSRJTWi3EC4DPgn8MdW3pf0V8B/qCiVJ6rx2TxmtAvZnZm9mLqL6kpxfqi+W\nJKnT2i2E9wBvb5lfBbx76uNIkprS7imjWZnZes1ghOe/SnNcEfFS4LPAYmA28BHge1QfYf0p4Alg\nTWYeiIhLgcuprk2sz8wN7f4SkqQXr91C+EpEPADcR/VG/ibgi23s9zbgocz8eEScQXVR+tvApzLz\nixHxUWBtRNwCXA2cRfXppYciYmNm7p7k7yNJOk7t3qn8kYi4BzibanTwa5n5YBv73dYyewbw98C5\nwLqybBNwJfBDYGtmPgMQEfcD5wB3tPdrSJJerHZHCGTm/VRfoTlpEfFtqofhvQ34emYeKKt2Akuo\nTikNtOwyUJZLkjqk7UJ4MTLznIj4BeBWRl97GO86xDGvT/T2zqG7e9aR+aGhHqbrA5Z6e3vo65vX\ndAxJmlCthRARrwV2ZmZ/Zj4cEbOAPRExOzP3U40atgM7GD0iWApsmei1Bwf3jZnfO6XZp9Lg4F4G\nBvY0HUOSJjw4nfQX5EzSG4EPA0TEYmAucDdwSVm/GrgT2AqcFRHzI2IusILqArYkqUPqLoRPA6dF\nxLeoLiD/J+Ba4L0RcS/QC9ycmc8CVwGby5/rMtNDaknqoFpPGZU3+kuPsmrVUbbdCGysM48kaXx1\njxAkSTOEhSBJAiwESVJhIUiSAAtBklRYCJIkwEKQJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GS\nVFgIkiTAQpAkFRaCJAm
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fa64278>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('SibSp', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that most passengers traveled without siblings or spouses. \n",
"\n",
"We analyse if this had impact on its survival."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp\n",
"0 0.345395\n",
"1 0.535885\n",
"2 0.464286\n",
"3 0.250000\n",
"4 0.166667\n",
"5 0.000000\n",
"8 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('SibSp').Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f9e30f0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f91a160>], dtype=object)"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEJCAYAAACUk1DVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGLlJREFUeJzt3X+Q3PV93/HnIRCyhM5gchayNMhMbb89LkknmCmthBE/\nbAjDr7RAnYkGx+BMiG08CuCkcmNAwW6aGEMdHBMnwsKYQgdIFWJZgGUSFxebVqSOIXTG7xqZCuuH\n0YEFJ8mcJE7XP77fE3fnk25377u339M9HzM32u+v9753tbev+/747HYNDg4iSdIRnW5AklQPBoIk\nCTAQJEklA0GSBBgIkqSSgSBJAuDITjegQ4uIs4FbgDnAJuDKzNza2a6kzoiII4E/Ba4FFvq7UC33\nEGosImYD/xW4KjPfDXwD+MvOdiV11N8CfYADqNrAQKi3s4GNmfl0Ob0aODci5nSwJ6mTbs7MPwK6\nOt3I4chAqLd3ARuHJjJzN/Ay8I6OdSR1UGb+r073cDgzEOptNtA/at5rFOcTJKlSBkK97QZmjZo3\nG9jVgV4kHeYMhHr7IfDOoYmIeDNwLPCjjnUk6bBlINTbt4ETI2JxOX0t8I3MfK2DPUk6THX58df1\nFhFnALdTHCp6DvhwZm7vbFfS5IuItwKPl5NDF1y8DpyTmds61thhpKFAiIhZwLPAzcDfA/dQ7F1s\nA67IzH0RsQxYDgwAqzJzddu6liRVrtFDRjdQXO4IRSh8MTOXUiT0VeUAqhsorps/C7g2Io6tullJ\nUvuMGwgREcC7gXUUg0GWAmvLxWuBDwCnARsyc1dm9gNPAEva0rEkqS0a2UO4FbiON0YGzsnMfeXt\n7cB8YB7QO2yb3nK+JGmKOGQgRMQVwPcyc9NBVjnY8HGHlUvSFDPep51eAJwUERcBC4C9wK6IODoz\n95TztgBbGblHsAB4crw7//q69YOtfkTV+04/jeOOfXNrG+twN+X+IBkcHBzs6ppybWtqaPiFdchA\nyMzfGLodETcC/w9YDFwG3AtcCjwKbADujIhuYH+5zvLx7vwv/uZZjjjmxEZ7PWBgXz/9e77HWWec\nfmBeT89cent3Nl1rLFXVqmNP06FWT8/cCrqZXF1dXbV7HqdDrTr21I5ajWrm+xCGUuYm4J6I+B2K\nz+e/OzMHImIFsJ4iEFZm5riP5ogjjuSIGUc10UJhcP9A09tIkg6t4UAoP3J2yLljLF8DrKmiKUnS\n5POjKyRJgIEgSSoZCJIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUDQZIEGAiSpJKBIEkCDARJUslA\nkCQBBoIkqWQgSJIAA0GSVDIQJEmAgSBJKo37FZoR8Sbgq8A84Gjgs8BlwHuBl8rVbsnMRyJiGbAc\nGABWZebqdjQtSapeI9+pfBHwVGZ+PiJOBL4FfBdYkZkPD60UEbOBG4BTgdeBpyJiTWa+0oa+JUkV\nGzcQMvOBYZMnAj8pb3eNWvU0YENm7gKIiCeAJcC6CvqUJLVZI3sIAETEd4EFwIXA9cDHI+I64EXg\nE8AJQO+wTXqB+dW1Kklqp4ZPKmfmEuBi4F7gaxSHjM4BfgCsHGOT0XsQkqQaa+Sk8inA9szcnJnP\nRMSRwD9l5tAJ5bXAHcCDFOcbhiwAnqy64SHd3bPo6Zk7Yt7o6YmoqlYde5outaaauj6Ph3utOvZU\nda1GNXLI6AxgEXBtRMwDjgH+MiI+mZnPA2cCzwIbgDsjohvYDyymuOKoLfr6+unt3Xlguqdn7ojp\niaiqVh17mg61pmqo1O15nA616thTO2o1qpFA+DLwlYj4DjAL+BiwC7g/InaXt6/MzP6IWAGspwiE\nlZlZzSOSJLVdI1cZ9QPLxlj0L8dYdw2wpoK+JEmTzJHKkiTAQJAklQwESRJgIEiSSgaCJAkwECRJ\nJQNBkgQYCJKkkoEgSQIMBElSyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVxv0K\nzYh4E/BVYB5wNPBZ4GngHopA2QZckZn7ImIZsBwYAFZl5uo29S1JqlgjewgXAU9l5pnAB4HbgJuB\nP8/MpcBG4KqImA3cAJwNnAVcGxHHtqVrSVLlxt1DyMwHhk2eCPwEWApcXc5bC3wS+L/AhszcBRAR\nTwBLgHVVNixJao9xA2FIRHwXWECxx/CtzNxXLtoOzKc4pNQ7bJPecr4kaQpo+KRyZi4BLgbuBbqG\nLeoae4uDzpck1VAjJ5VPAbZn5ubMfCYiZgA7I+LozNxDsdewBdjKyD2CBcCT7WgaoLt7Fj09c0fM\nGz09EVXVqmNP06XWVFPX5/Fwr1XHnqqu1ahGDhmdASyiOEk8DzgGeAS4jGJv4VLgUWADcGdEdAP7\ngcUUVxy1RV9fP729Ow9M9/TMHTE9EVXVqmNP06HWVA2Vuj2P06FWHXtqR61GNXLI6MvAWyPiOxQn\nkD8K3AT8VkQ8DhwH3J2Z/cAKYH35szIzq3lEkqS2a+Qqo35g2RiLzh1j3TXAmgr6kiRNMkcqS5IA\nA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElS\nyUCQJAEGgiSpZCBIkgADQZJUGvcrNAEi4nPA6cAM4E+Ai4H3Ai+Vq9ySmY9ExDJgOTAArMrM1dW3\nLElqh3EDISLOBN6TmYsj4i3APwJ/B6zIzIeHrTcbuAE4FXgdeCoi1mTmK23pXJJUqUYOGT0OXF7e\nfgWYQ7Gn0DVqvdOADZm5KzP7gSeAJVU1Kklqr3H3EDJzEHitnPxtYB3FIaFrIuI64EXgE8AJQO+w\nTXuB+ZV2K0lqm4ZPKkfEJcCVwDXAPcC/z8xzgB8AK8fYZPQehCSpxho9qXwe8CngvMzcCXx72OK1\nwB3Ag8BFw+YvAJ6sqM9f0N09i56euSPmjZ6eiKpq1bGn6VJrqqnr83i416pjT1XXalQjJ5W7gc8B\n52Tmq+W8vwZ+PzOfB84EngU2AHeW6+8HFlNccdQWfX399PbuPDDd0zN3xPREVFWrjj1Nh1pTNVTq\n9jxOh1p17KkdtRrVyB7CB4HjgQciogsYBO4C7o+I3cAu4MrM7I+IFcB6ikBYWe5NSJKmgEZOKq8C\nVo2x6J4x1l0DrKmgL0nSJHOksiQJMBAkSSUDQZIEGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIA\nA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAlo7Cs0JbXZf/rCal548bWWtn3nwjdz\nxb+7pOKONB01FAgR8TngdGAG8CfAUxRfoXkEsA24IjP3RcQyYDkwAKzKzNVt6Vo6zGx/9XU2713Y\n0rZvebW34m40XY17yCgizgTek5mLgfOBLwA3A3+emUuBjcBVETEbuAE4GzgLuDYijm1X45KkajVy\nDuFx4PLy9ivAHGAp8PVy3lrgA8BpwIbM3JWZ/cATwJJq25Uktcu4h4wycxAYOrj5EWAdcF5m7ivn\nbQfmA/OA4fuuveV8SdIU0PBJ5Yi4BLgKOBd4btiiroNscrD5kqQaavSk8nnApyj2DHZGxM6IODoz\n9wALgC3AVkbuESwAnqy64SHd3bPo6Zk7Yt7o6YmoqlYde5outaaLWbNmTonfhbrWqmNPVddq1LiB\nEBHdwOeAczLz1XL2Y8ClwH3lv48CG4A7y/X3A4sprjhqi76+fnp7dx6Y7umZO2J6IqqqVceepkOt\n6RYq/f17a/+7UNdadeypHbUa1cgewgeB44EHIqILGAR+C/hKRFwNbALuzsyBiFgBrKcIhJWZWc0j\nkiS1XSMnlVcBq8ZYdO4Y664B1lTQlyRpkvnRFZIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUDQZIE\nGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIAA0GSVDIQJEmAgSBJKhkIkiTAQJAklRr5TmUi4mTg\nIeC2zLwjIu4C3gu8VK5
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f943f60>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(column='SibSp', by='Survived', sharey=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that it does not provide too much information. While the survival mean of all passengers is 38%, passengers with 0 SibSp has 34% of probability. Surprisingly, passengers with 1 sibling or spouse have a higher probability, 53%. We are going to see the distribution by gender"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 174\n",
" male 434\n",
"1 female 106\n",
" male 103\n",
"2 female 13\n",
" male 15\n",
"3 female 11\n",
" male 5\n",
"4 female 6\n",
" male 12\n",
"5 female 1\n",
" male 4\n",
"8 female 3\n",
" male 4\n",
"dtype: int64"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that for SibSp, there is almost the same number of men and women. Now we calculate the survival probability."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 0.787356\n",
" male 0.168203\n",
"1 female 0.754717\n",
" male 0.310680\n",
"2 female 0.769231\n",
" male 0.200000\n",
"3 female 0.363636\n",
" male 0.000000\n",
"4 female 0.333333\n",
" male 0.083333\n",
"5 female 0.000000\n",
" male 0.000000\n",
"8 female 0.000000\n",
" male 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f84b710>"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHyVJREFUeJzt3XucVWXd9/HPHGWYAZkZJhQRkoM/D1k+eEtgnjE8lA/c\nacljWUZ28Bjelk+iGaJ4TBBuj2BTmWVl3ZmmKFhaeXpeQabcFr/EFBjARw4DzAwzzGHv+4+9wT3j\nHNbAXnvPnvV9v16+2Ou4fwzj/u51rWtdV148HkdERKIpP9sFiIhI9igEREQiTCEgIhJhCgERkQhT\nCIiIRJhCQEQkwgrDfgMz+wjwGDDP3e/tsO00YC7QCixx95vCrkdERN4X6pWAmQ0EFgLPdrHLAuDf\ngeOBKWZ2WJj1iIhIe2E3BzUBZwIbO24ws0OALe6+wd3jwFPA5JDrERGRFKGGgLvH3H1XF5sPADal\nLL8HHBhmPSIi0l5fujGcl+0CRESiJvQbw93YQPtv/gcl13WptbUtXlhYEGpR0jc1NjZy9tlnE4/H\nycvL44knnqCkpCTbZYnkii6/ZGcyBNoV4e5rzGyQmY0k8eH/aeD87k5QW7szxPKkL6ur28HuwQ7j\n8TgbN25l0KDBWa5KJDdUVQ3qcluoIWBm44E7gVFAi5mdAzwOvO3uvwUuBn4OxIFH3H11mPWIiEh7\noYaAu/8VOKWb7S8Ax4VZg4iIdK0v3RgWEZEMUwiIiESYQkBEJMIUAiIiEaYQEBGJMIWAiEiEZfOJ\nYRGRnBKLxZg//w5qa7dQUFBIXV0dl102k9Gjx2S7tL2mEBARCeitt97kvffe5bbb5gNQU7OOmpp1\nLF26hE2b3qO1tYVp085lzJhxXHPNVcybdzdvvLGSp59+klmzvpfl6junEBARCeiQQ8ZQXLwft9wy\nh6OPHs9HP3o0Q4aUs2HDeubMuYVdu5q4/PJvsGjRj7jwwq9w//13s2bN29x00+3ZLr1LCgERkYAK\nCwu58cZb2bFjO2+88d9UVz+A+yqKi4u5+eYbACgoSAxyeeyxE3nggXs5+eRTGThwYDbL7pZCQEQk\noFdfXcGOHds56aRTmTTpE4wdO47PfvZ/c/rpZ3HNNdcDsGbNOwAsXbqEE044iRUr/sLpp59FVdWH\nslh51xQCIiIBjRtnzJt3G0uW/I7i4v1oampk/vx7eOWVl5g7dzb19fV8/OOTGDhwIE8++QTz59/N\n8cefxO23z+WOOxZku/xO5e0enjcXbNpUlzvFSlrV1e3gq1/94p7lxYsf0lDSIgFVVQ3qcj4BPScg\nIhJhCgERkQhTCIiIRJhCQEQkwhQCIiIRphAQEYkwPScgIv1OW1sbNTVr03rOESNG7nkaOJ1uvvkG\nTjllMpMmHZ/2cwehEBCRfqemZi3X3fUrBpRVpOV8TfVbuWnmuYwadUhazteXKAREpF8aUFbBwMFV\nGX3PJUt+x6uvrmD79m28887bfPWrF/Pss8/wzjvvcP31c/j975exatXfaW7exdSp5/DpT0/dc2ws\nFuP22+eyceMGWltb+cpXvs748f8Wes0KARGRNFq/voZ77lnME088xsMP/5gf/vCnPPnk4zz11BMc\ncshoLr/8Snbt2sV5501rFwLLlj3N0KFVfOc732X79m1cccXF/PjHj4Rer0JARCSNDjvscAAqK4cy\nZsxY8vLyqKiopLm5me3bt3PxxTMoLCxi+/Zt7Y5bufJ1Vq78G6+//jfi8TgtLc20trZSWBjux7RC\nQEQkjVJvHqe+fvfdjWzYsJ577nmQ/Px8pkw5qd1xRUVFfPGLM5g8eUrGagV1ERURyYhVq/7BsGHD\nyM/P54UX/kgs1kZra+ue7UcccSR/+tPzANTWbuWBB+7JSF26EhCRfqmpfmufOtexx05g3bp1XH75\n1znhhJM57rgTuPPOW/dsP/XUT/LXvy7n4otnEIvFmTHja/v8nkFoKGnJCRpKWnojl54TyITuhpLW\nlUAvVVcvYunSp5gy5ayMJbWI9E5BQUG/7NMfBt0T6IWmpkaWLVsCJLpzNTU1ZrkiEZF9oxDohZaW\nFnY3n8XjMVpaWrJcUe9VVy9i+vRpVFcvynYpItIHKAQiRFcyItKRQiBC+sOVjIikl0JARCTC1DtI\nRPqdbHQRbW1t5ZJLLuLDHz6EWbO+l5b3fPfdjVx33f/lwQcfSsv5OqMQEJF+p6ZmLbP/ay4l5aVp\nOV9jbQOzP3Ntt91ON2/eTGtrS9oCYLe8Lnv4p0foIWBm84CJQAyY6e7LU7ZdCnweaAWWu/t/hF2P\niERDSXkppUMHZez97r57HuvX13DzzTewc+dO6uvraGtr48orv83o0WM577xpnH32NJ5//g8cdNAI\nzA7nueee5eCDR3L99TeyevWbzJt3G0VFReTl5XHjjbe1O/9rr73KokX3UlhYxLBhw7j66mvTMrhc\nqPcEzOxEYKy7HwdcBCxM2TYI+BbwCXc/ETjSzCaEWY+ISFguu+xKDj54FAcdNIKJE4/jrrvu5aqr\nvsN//ud8IDFfwGGHHcGDDz7EypWvcdBBB7F48Y957bVXaWiop7Z2K1deeTULFtzHUUd9jKVLl7Q7\n/4IF3+fWW+exYMG9DBlSznPPPZuWusO+EpgMPAbg7qvMbIiZlbl7PdAM7AIGm1kDUAKkb7APEZEs\nWLnyNbZv38YzzzwFQHNz855thx9+BAAVFZWMHXto8nUF9fX1VFRUct99C2lqamLLls1MmXLmnuNq\na7eybt06rr3228TjcZqamhgypDwt9YYdAgcAy1OWNyfXrXb3XWY2B/gXsBP4ubuvDrkeEZFQFRUV\nM3Pm1Rx55Ec+sK2goLDT1/F44pv+BRdcyLHHTuSRRx5u9xxPYWERVVVVLFx4f9rrzfSN4T23OJLN\nQbOAsUAd8JyZHeXuK7s6uLx8IIWF2RvAqbg41m65srKM/ffPXJvjvsrl+nO5dsm8HTvSc0M4VXl5\nKVVVXf/ONTfvoLAwnwkTjmH58hc5+eRJrF69mhdeeIELL7yQ/Pw8hg4to6SkhMLCfCorE+crKMin\nomIgDQ11fOQjxv7778eKFa9w9NFHU1FRSmFhAaNHD6ewsIAdO95jzJgxPPzww0yYMIFDDz10n/9e\nYYfABhLf/HcbDmxMvj4ceMvdawHM7M/AMUCXIVBbuzOkMoOpq6tvt7xlSz3NzbnzqEUu15/LtUvm\n1dY20FjbkLbzNdY2UFvbwKZNdV3us3VrA21tMc44Yxpz536Pz31uOrFYjJkzv82mTXXEYrB5cz0D\nBrTS1hZjy5YGiorqaGuLsXXrTqZOPZevfe3rjBhxMFOnnsv8+XcwadLJtLa2sWlTHd/61iy+9a2r\nKS4uprJyKJMnf6rbelJ1F16hDiVtZpOA2e5+upmNB+5K3gTGzD4EvAAclWwaWgrc4O4vdnW+MIeS\nDtKvuKGhgTlzrtuzfP31N1Fa2vk3jr447GwuD8ecy7VL5mko6fayNpS0u79sZivM7EWgDbjUzL4E\nbHP335rZHcDzZtYCvNRdAIQtSL/iWHNbu+WFL9xPfvEHfymC9CkWkfBoKOngQr8n4O6zOqxambJt\nMbA47BqC6qlfcduuVlKnhh5YWUbBfnreTkRylxpVRUQiTF9jpU/oqQ23oaH9Tb5169Z2eT8Gcrv9\nViSTFALSJ/R0Tybo/RjQPRmR3lAISJ/R3T0Z3Y8RCYf+L+ongnZxTdVdk4qaU0SiQSHQT6iLq4js\nDYVAP6IuriLSW+oiKiISYQoBEZEIUwiIiESYQkBEJMIUAr2Ql58yEF9eh2URkRykEOiF/KICyg6t\nAKBsXAX5RepHLyK5Tf0De6l8wnDKJwzPdhkiImmhKwERkQhTCIiIRJhCQEQkwhQCIiIRphAQEYkw\nhYCISIQpBEREIkwhICISYQoBEZEIUwiIiESYQiBCNACe7Ivq6kVMnz6N6upF2S5F0kghECEaAE/2\nVlNTI8uWLQFg2bKnaWp
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f890358>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that when SibSp > 2, the survival probability decreases to the half. We are going to check if there is a difference in the age. "
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 28.631944\n",
" male 32.615443\n",
"1 female 30.738889\n",
" male 29.461505\n",
"2 female 16.541667\n",
" male 28.230769\n",
"3 female 16.500000\n",
" male 8.750000\n",
"4 female 8.333333\n",
" male 6.416667\n",
"5 female 16.000000\n",
" male 8.750000\n",
"8 female NaN\n",
" male NaN\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).Age.mean()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f7dab38>"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAESCAYAAAD67L7dAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH61JREFUeJzt3Xt8VPWd//FXrpALaIgpCggq4kfRuq5drVgteFmsVsUV\nurpa3Upbra12tf5atboWr1VUUKtWoeK17U/txdUKCrRetlb7sNYK2vqpF1QCWhBCSAIh1/3jTCCJ\nSWYS5szJzHk/Hw8ezJlzzswn55G85zvf8z3fk9fe3o6IiMRHftQFiIhIZin4RURiRsEvIhIzCn4R\nkZhR8IuIxIyCX0QkZgrDfHEzKwHuA0YCQ4BrgBnAZ4CPE5vd6O6LwqxDRES2CTX4gROAl939JjMb\nCywBXgAucfeFIb+3iIj0INTgd/dHOi2OBVYmHueF+b4iItK7vExcuWtmLwCjgeOBi9jW9fMP4Dx3\nXx96ESIiAmTo5K67fw44Efgp8ABBV89RwGvAlZmoQUREAmGf3D0QWOPu1e6+zMwKgeXu3nFi93Hg\nzr5eo6Wltb2wsCDMMkVEclGvXephn9z9PDAOuNDMRgLlwN1m9v/cfQUwBXi9rxeoqdkUcokiIrmn\nqmpYr+vCDv67gHvM7HlgKPBNoB542MwaEo/PCrkGybAFC+axePFCpk49jpkzz466HBHpJiMnd7fH\n2rV1g7tA6aKxcTNnnXUa7e3t5OXlc++9P2Xo0JKoyxKJnaqqYb129eTslbsLFszj1FNPYsGCeVGX\nEivNzc10NCba29tobm6OuCIR6S4ng7+xcTNLlgQXAy9Z8hSNjZsjrkhEZPDIyeBXq1NEpHc5GfzS\nlbq9RKQzBX+OU7eXiHSn4M9x6vYSke7CHscvktXa2tqYO/dGamrWUVBQSF1dHeeddwF77DE+6tJE\nBkzBL9KHd955izVrPuKGG+YCUF29kurqlSxevIi1a9fQ0tLMSSfNYPz4CVx66UXMmXM7b7yxnKee\nepLvf/8HEVcv0jMFv0gfdt99PMXFQ/jhD6/igAMOZP/9D2DHHStYvXoVV131Q7ZsaeT887/BvHn3\n8ZWvfJW77rqd999fwTXXzI66dJFeKfgHKU17MDgUFhZy9dXXs3FjLW+88ToLFtyN+5sUFxdz3XXB\nxLIFBcEkggcddAh3330nU6YcSWlpaZRli/Qp64K/tbWV6uoP+tymoaGhy/LKlR9QVlbW47Zjxozd\n+oc7WHQfiXPaaWdo2oOIvPrqK2zcWMvkyUcyadLn2HPPCXzpSydyzDHHcemlVwDw/vvvAbB48SIO\nP3wyr7zyMscccxxVVZ+KsHKR3mVd8FdXf8Dlt/yCoeUjet2mvbWpy/Kch/6XvILiT2zXWL+eay6Y\nwbhxu6e9zu3R00gcBX80Jkww5sy5gUWLfkNx8RAaGzczd+4dvPTSH7j22lnU19fz2c9OorS0lCef\nfIK5c2/nsMMmM3v2tdx4461Rly/So6wLfoCh5SMoHV7V6/q2lkbqOy2XDKskv3Bo+IVJzikvL+eK\nK67+xPP//M+f+cRzt94a3Fpi/Pg9FfoyqGkcv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0Qk\nZrJyOKdId6lc2NdfYVzcd911V3LEEUcxadJhaX1dkf7IzeDP6/zHmtdtWXJRKhf29cdgvbhPJB1y\nMvjzC4ooqdqHzWv/RknV3uQXFEVdkmRAsgv70m3Rot/w6quvUFu7gffeW8HXv34uS5c+zXvvvccV\nV1zFb3+7hDff/CtNTVuYNm06xx8/beu+bW1tzJ59LR9+uJqWlha++tVzOPDAf8lY7RJvORn8AMPH\nTmL42Elpea3BOmFaHOYtGuxWrarmjjvm88QTj/HQQ/dz770/5cknH2fhwifYffc9OP/8C9myZQun\nnHJSl+BfsuQpdtqpiksu+W9qazfw7W+fy/33/zzCn0TiJNTgN7MS4D5gJDAEuAZ4DXiQ4MTyh8AZ\n7j5obws1mCdMq67+gFm/upaSip6DHKCtqbXL8m2/v4v84k+G++aaBmadfJm6Nvpp7733AaCycifG\nj9+TvLw8RoyopKmpidraWs49dyaFhUXU1m7ost/y5ctYvvwvLFv2F9rb22lubqKlpYXCwpxti8kg\nEvZv2QnAy+5+k5mNBZYALwC3u/svzexaYCZwd8h1DNhgnzCtpKKMsp2G9bq+dUsLnSOntLKcgiEK\nl3Tp/A2p8+OPPvqQ1atXcccdPyE/P5+pUyd32a+oqIgzz5zJUUdNzVitIh1CHc7p7o+4+02JxbHA\nSmAy8HjiuSeAo8OsQSQKb775N0aOHEl+fj6///1ztLW10tLSsnX9xIn78vzzzwJQU7Oeu+++I6JK\nJY4y0vQzsxeA0QTfAJZ06tpZA+ySiRok9zXWrx80r3XQQQezcuVKzj//HA4/fAqHHno4N998/db1\nRx75r/z5z3/i3HNn0tbWPqjOHUnuy0jwu/vnzGx/4KdAXqdVeb3sslVFRSmFhdu+Qm/c2Ht/9kBU\nVJRRVdV7V0lxcVuX5crKcnbYofft0yHV98z0sUhFFMcLYMSIfbnjyvQej912263Pk91nnvkfWx9P\nm3Ys06Yd+4nHfbnpphu2v0iRAQj75O6BwBp3r3b3ZWZWANSZ2RB330LwLWB1X69RU7Op23JDL1sO\nTE1NA2vX1vW6vq6uvsvyunX1NDWFe8Fzqu+Z6WORiiiOV4fhw9N7x6v16zcl30hkkOqrERf2X+Tn\ngYsAzGwkUA4sBWYk1k8Hngq5BhER6STsrp67gHvM7HlgKHAu8ArwoJmdDbwP3B9yDYNKOsfer1pV\nndbaRCQeQg1+d28ETu9hVWzHsKXznsG1a1awU3quURORGNGA7gik657BwciT7euTF5H40bTMIiIx\noxa/5IQopmVuaWnhm9/8Grvttjvf//4P0vKeH330IZdffjE/+ckDaXk9kZ7EOvjb29uSniDVJGfZ\nIZV5i/ojlbmLPv74Y1pamtMW+h3ykl7dIrJ9Yh38Wxo2MP+P92uSsxyRbN6idLv99jmsWlXNdddd\nyaZNm6ivr6O1tZULL/wue+yxJ6ecchInnHASzz77O0aPHoPZPjzzzFJ23XUsV1xxNW+//RZz5txA\nUVEReXl5XH111wu6XnvtVebNu5PCwiJGjhzJ9753mSZxk7SIfR9/R1j09q+0srzL9qWV5T1ul66W\npmSP8867kF13Hcfo0WM45JBDueWWO7nookv40Y/mAsGc+3vvPZGf/OQBli9/jdGjRzN//v289tqr\nNDTUU1Ozngsv/B633vpjPv3pf2Lx4kVdXv/WW2/i+uvncOutd7LjjhU888zSKH5MyUFqPki/JOtL\n70/XGORG99jy5a9RW7uBp59eCEBT07bhuPvsMxGAESMq2XPPvRKPR1BfX8+IEZX8+Me30djYyLp1\nHzN16rZpHmpq1rNy5Uouu+y7tLe309jYyI47VmTwp5JcpuCXfkl2HUKq1yBA7tzesKiomAsu+B77\n7rvfJ9YVFBT2+Li9PWjRn3HGVzjooEP4+c8forFx89b1hYVFVFVVcdttd4VbvMRS7Lt6pP86rkPo\n6V/JsMou25YMq+x123TdHzdqEyfux/PPPwPAihXv8sgjP+tz++D2Du3U1tYyatQYmpqaeOmlF2hu\n3nY/omHDhpGXl8d7760A4Je/fJh33307rB9BYkYt/hyXl99piEhet+UcszmNk9al+lp5eTBjxilc\ne+0P+Na3vk5bWxsXXPDdjrVdtuv6OI/p0/+dSy75DmPG7MqMGacwd+6NXW7McvHFl3PddVdSXFxM\nZeVOTJs2fft/MBEU/Dkvv6iA8r1GUP/39ZRPGEF+UXb3p/dmzJixzDr5srS/Zl923nkX5s8Pxttf\nc83sT6x/9NH/2fq4Y7vOj0888d848cR/2/r84YdP6bJ+//0PYN68+wZUu0hfFPwxUHHwKCoOHhV1\nGaEqKCjI+nMFIpmiPn4
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f890a90>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Age', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Effectively, when SibSp > 3, age is lower. We are going to check the relationship with Pclass."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Pclass\n",
"0 1 137\n",
" 2 120\n",
" 3 351\n",
"1 1 71\n",
" 2 55\n",
" 3 83\n",
"2 1 5\n",
" 2 8\n",
" 3 15\n",
"3 1 3\n",
" 2 1\n",
" 3 12\n",
"4 3 18\n",
"5 3 5\n",
"8 3 7\n",
"dtype: int64"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Pclass']).size()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Pclass\n",
"0 1 0.562044\n",
" 2 0.416667\n",
" 3 0.236467\n",
"1 1 0.746479\n",
" 2 0.581818\n",
" 3 0.325301\n",
"2 1 0.800000\n",
" 2 0.500000\n",
" 3 0.333333\n",
"3 1 0.666667\n",
" 2 1.000000\n",
" 3 0.083333\n",
"4 3 0.166667\n",
"5 3 0.000000\n",
"8 3 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Pclass']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f8a3a58>"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHIdJREFUeJzt3Xt0VeWd//F3LiSEJECAIFoG1Eq/eKljvbSISpfgtauj\nrdWfVq2Mjk6X4gVLf9MRb0VFnbYgolVHZrBV66V1LOpvxELVKl7GKTpeWsevF64BLyEcCIlJCMn5\n/bEPcBIJOZC9z8nJ/rzWcnH2JXt/A8fzOfvZ+3megmQyiYiIxFNhrgsQEZHcUQiIiMSYQkBEJMYU\nAiIiMaYQEBGJMYWAiEiMFUd9AjM7CFgAzHb3uzptOxa4GdgCuLtfGHU9IiKyXaRXAmY2AJgL/LGL\nXe4BTnP3Y4CBZnZSlPWIiEhHUTcHNQMnAx93sf0wd9+6rRYYGnE9IiKSJtIQcPd2d2/ZyfYGADPb\nEzgeeDrKekREpKOc3xg2s+HAk8DF7p7IdT0iInES+Y3hnTGzSoJv/1e5+7Pd7b9lS1uyuLgo+sJE\nRPqWgq42ZDMEdlTEbIKnhhZncoBE4vNwKxIRiYHq6soutxVEOYqomR0KzAJGA63AGoKmn+XAImA9\n8CpBQCSBh9z937o6Xm3tJg15KiKyi6qrK7u8Eog0BMKmEBAR2XU7C4Gc3xgWEZHcUQiIiMSYQkBE\npIfmz7+Xs876DvPn35vrUnaZQkBEpAeam5tYvHghAIsXP0Nzc1OOK9o1CgERkR5obW1l6wM2yWQ7\nra2tOa5o1ygEUubPv5dFi57JdRkiIlmV0x7D2fA///M6N988gzFjjGSynZaWFi6/fBp7771PrksT\nEcm5Ph8CAIcf/g1+8pOrAXjrrTe57755VFUNoa5uHY2NDVx22Y+27VtfX88tt8ygsnIg69bVcu21\nN9DW1s6sWbcwePAQNmxYz7RpV7Fy5XIef/y3lJdXADB9+vU5+d1ERHoiFiEQdEYObNiwnvr6jZSW\nlnLjjbeycuUK6urWbdu+ceMGzjzzHA455FAefvhBlix5gSFDhrLHHntyxRXTqKuro7i4mFdeeYlJ\nk05k4sTjWLVqBVu2bKG4OCZ/nSLSZ8TiU2vp0v9m+vT/SzKZZNCgQZx44rdYu3YNAKNH783o0Xvz\n5ptvAFBaWsof/rCQF198nuXLlzF+/DEcddQxrF69iiuvnMLQoUO5/PJpnHfe+dx//338/ve/46CD\nDuaHP5ySy19RRGS3xCIE0puDAN599y+89tqrACxb9hGrV69MbUny6KO/4YgjvsHEiccxb97dtLe3\nUVOzmuOPP5Hvf/9cnnpqAX/4w9OMHXsAF198GcXFxdxww7V88IEzZozl4LcTEdl9sQiBzg444CCq\nqqq47rqrqK/fyBVX/Jhlyz4CCvja1w7n4Ycf4K233mCPPUawaNEzjB17APfffx/Dhg2joWETl1xy\nBe+++xceeuh+Bg0aTEFBAXvvvW+ufy0RkV2mAeRERHpg06Z6LrrovG3L8+bdT2XlwBxW9EUaQE5E\nRHZIISAiEmMKARGRGFMIiIjEmEJARCTGFAIiIjHWp/oJtLW1UVOzKtRjjhw5iqKiom73W7bsQ666\n6seceeY5nHbaGaHWICISlT4VAjU1q7hmzmP0rxgSyvGaG9Zz09TTGT165yOONjc3M2fOLzj88K+H\ncl4RkWzpUyEA0L9iCAMGVmf1nCUlJfziF3N58MFfZfW8IiI9pXsCISgsLKSkpCTXZYiI7DKFgIhI\njCkERERiTCEQsnwakE9EpM/dGG5uWJ/1Y7m/x5133sYnn3xCcXERL7zwHDNn/pzKysrQahERiUKf\nCoGRI0dx09TTQz9md8zGcscd/xrqeUVEsiHyEDCzg4AFwGx3v6vTtuOAmcAWYKG739STcxUVFXX7\nTL+IiGwX6T0BMxsAzAX+2MUutwPfBY4GTjCzsVHWIyIiHUV9Y7gZOBn4uPMGM9sHqHP3te6eBJ4G\nJkVcj4iIpIk0BNy93d1butg8AqhNW/4M2DPKekREpKPe9Ihol3NgiohINHL5dNBaOn7z/1JqXZeq\nqgZQXNz9iJ4iItlSUtLeYXno0AoGDcqfx8OzGQIdvum7+0ozqzSzUQQf/t8Gzt7ZARKJz3d6glwO\nJX3XXbfz9ttv0dbWxrnn/j3f/OaxodYhIr3Tpk0NHZbr6hrYvLk3NbJAdXXXoRRpCJjZocAsYDTQ\nambfA54Elrv7E8DFwCNAEnjY3T/syflqalbx08dnUlZV3sPKA02JRn562tXdPnb6xhtLWbFiOffc\nM5/6+o2cf/45CgERyQuRhoC7vwF0+Wno7i8B48M8Z1lVOeXDsnsp9rWvHcaBBx4EQEVFJS0tzSST\nSQoKdJtDRHq33nXNkqcKCgooLe0PwFNPLWDcuKMUACKSF/rUsBG5tmTJn3j66ae47bY7c12KiEhG\nFAIhee21V3nggV8xe/adDBgQzj0JEZGoKQRC0NjYwF13zeX22++moqIi1+WIiGSsz4VAU6Ix68d6\n9tnF1Ndv5Lrr/nnbDeFrrpnB8OF7hFaLiEgUCvJpEpTa2k07LTaX/QREJJ42barnoovO27Y8b979\nVFYOzGFFX1RdXdnlkyp96kpAQ0mLiOwaPSIqIhJjCgERkRhTCIiIxJhCQEQkxhQCIiIx1qeeDsrV\nI6ItLc3MnDmD9evraG1tZfLkf2D8+KNDrUNEJAp9KgRqalbx3PVXM6ysLJTjrWtqYuKMmd0+dvrS\nS0sYO/YAzj77B3zyySdceeUlCgERyQt9KgQAhpWVMSLLY/dMmnT8tteffvoJw4ePyOr5RSQzUbQW\nNDZ2HFlg9epVlJeH9xkUdYfVPhcCuXTxxRdQW1vLz352W65LEZEdCLu1AKClveP0ku/ePovSwnBu\nt2baGtETCoEQ3X33fD744H1mzLiWX//64VyXIyI7EHZrQVNbG2xMbFseXjaAsjwaakZPB4XA/T0+\n++xTAMaM+QptbW1s2LAhx1WJiHRPIRCCt956g0ceeRCA9evraG5uYvDgwTmuSkSke32uOWhdU1PW\nj3Xqqd/j1ltvZMqUi9i8uYVp034SWg0iIlHqUyEwcuQoJs6YGfoxu1NaWsr1198U6nlFRLKhT4WA\nhpIWEdk1uicgIhJjCgERkRhTCIiIxJhCQEQkxhQCIiIxphAQEYkxhYCISIxF3k/AzGYD44B2YKq7\nL03bNgU4B9gCLHX3H0Vdj4iIbBfplYCZTQD2c/fxwIXA3LRtlcCPgaPcfQJwoJl9Pcp6RESko6ib\ngyYBCwDc/T1gsJlVpLZtBlqAgWZWDJQB6yOuR0RE0kQdAiOA2rTldal1uHsLcAOwDFgOvObuH0Zc\nj4iIpMn22EEFW1+kmoOmA/sBm4Dnzeyr7v5OVz9cVTWA4uL8maxBRHqX+vrsTj0bhqqqcqqrKyM7\nftQhsJbUN/+UvYCPU6/3Bz5y9wSAmS0BDgO6DIFE4vOIyhSROEgkGrvfqZdJJBqprd3Uo2PsLESi\nbg5aBJwOYGaHAmvcfeu/wgpgfzMrTS0fDnwQcT0iIpIm0isBd3/VzF43s5eBNmCKmU0GNrj7E2b2\nc+BPZtYKvOLuL0dZj4iIdBT5PQF3n95p1Ttp2+YB86KuQUREdkw9hkVEYkwhICISYwoBEZEYy+ie\ngJmdApwE7J1atQJ4xt2fjKYsERHJhp2GgJkdBDxI0Kv3j8B/pjaNBv7ezGYA57r7XyOtUkREItHd\nlcDtwFmpcX86u8vMxgJ3AseFXpmIiESuuxA4yd1bAcysChgDJAF393p3f8/MTo66SBERicZObwyn\nBcCVwIfAHOAO4CMzuzh9HxERyT+ZdhabDOzr7hth21XB88DdURUmIiLRy/QR0U+2BgBAatC35dGU\nJCIi2ZLplcAyM1tAMCBcIXAsUGdmFwC4+/yI6hMRkQhlGgJlQAI4IrVcDxQBxxDcKFYIiIjkoYxC\nwN3P3/razAYDG909GVl
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f8cd240>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Sex\", y='SibSp', hue='Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that in 3rd class, females had higher SibSp."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f6b6e80>"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHsRJREFUeJzt3XmcVOWV//FPLyxNN0uzKCIDGsSDSxJFxyguiaBGs5kY\nMxo1cYkziWHiEjKTETWKEfVnAlEniQn80ho1LjGJC7+4QBaNC/E3aFwyjicKCjRohKZZuumGprvm\nj6qG6rK76nZTt25X3e/79eJl3bVOtd117n2e556nLJFIICIi8VQedQAiIhIdJQERkRhTEhARiTEl\nARGRGFMSEBGJMSUBEZEYqwz7DczsYOAhYL67/zhj2wnAXGAH8Ji7Xxd2PCIiskuodwJmNgS4Ffhd\nD7vcAnwOOAY4ycymhBmPiIh0FXZzUCtwCvBO5gYz2xdocPe17p4AHgVmhByPiIikCTUJuHuHu2/r\nYfNYYF3a8nvAXmHGIyIiXfWnjuGyqAMQEYmb0DuGs1hL1yv/vVPrerRjR3uisrIi1KBk97S0tPDp\nT3+aRCJBWVkZixYtoqqqKuqwAssW//Lly7nkZ7Opqq0Ofr7GZm75yvVMmjQprJBFgujxIruQSaBL\nEO6+0syGmtkEkl/+nwLOynaCxsatIYYn+bBly2Y6ixImEgneeWcDQ4cOiziq4LLF39jYTFVtNdWj\nh/bqnI2NzaxbtyXvsYoENWZMz7+zoSYBM5sKzAMmAm1m9nngEeAtd38YuAi4D0gA97r7m2HGIyIi\nXYWaBNz9ReD4LNufAaaFGYOIiPSsP3UMi4hIgSkJiIjEmJKAiEiMKQmIiMSYkoCISIwpCaTU1S1g\n8eLHow5DRKSgonxiuCD+8pcXuP76OUyebCQSHWzbto2LL57FPvvsG3VoIiKRK/kkAHD44R/h29++\nAoCXX36J229fSG3tSBoa1tPc3MQ3vvHNnftu3ryZG26Yw9Chw1i/fh1XXXUt7e0dzJt3AyNGjGTj\nxg3MmnU5K1e+xW9+80uqq2sAmD376kg+m4jI7ohFEkg+kJy0ceMGNm/exKBBg/jud29k5cq3aWhY\nv3P7pk0bOeOMsznkkKnce+/dPP30U4wcOYo999yLSy6ZRUNDA5WVlTz33DPMmPFxpk8/gVWr3mbH\njh1UVsbkxykiJSMW31rLlv1/Zs/+NxKJBMOHD+fjH/8Ea9euAWDixH2YOHEfXnrpRQAGDRrEE088\nxp/+9EfeemsF06Ydy9FHH8vq1au47LKZjBo1iosvnsWXv3w+d955Ow8++AAHH/whvvrVmVF+RBGR\nPolFEkhvDgJ47bW/8vzzSwFYsWI5q1evTG1JcP/9v+Af//EjTJ9+AgsX3kZHRzv19as58cSP88Uv\nnsOiRQ/xxBOPMmXKgVx00TeorKzk2muv4o03nMmTLYJPJyLSd7FIApkOPPBgamtr+c53Lmfz5k1c\ncsm3WLFiOVDGoYcezr333sXLL7/InnuOZfHix5ky5UDuvPN2Ro8eTVPTFr7+9Ut47bW/cs89dzJ8\n+AjKysrYZ58PRP2xRER6reSTwKGHHsahhx72vvUXXzyry/K+++76Ej/mmON2vj7rrC8DMG/eoV32\n33vv8Zx44sn5DFVEpOD0nICISIwpCYiIxJiSgIhIjCkJiIjEmJKAiEiMKQmIiMRYSQ0RbW9vp75+\nVV7POX78BCoqKnLut2LFm1x++bc444yzOe20L+Q1BhGRsJRUEqivX8WVN/+KwTUj83K+1qYNXHfp\n6UycmL3iaGtrKzff/H0OP/yIvLyviEihlFQSABhcM5Ihw8YU9D0HDhzI979/K3fffUdB31dEZHep\nTyAPysvLGThwYNRhiIj0mpKA9At1dQs488zPUle3IOpQRGJFSUAi19rawpIljwGwZMnjtLa2RByR\nSHwoCeRZIpHIvZN00dbWtvPnlkh00NbWFnFEIvFRch3DrU0bCn4u99f54Q9/wLvvvktlZQVPPfUH\n5s79HkOHDs1bLCIiYSipJDB+/ASuu/T0vJ8zF7Mp/Od//jSv7ysiUggllQQqKipyjukXEZFd1CfQ\nD2mkjIgUipJAP6ORMiJSSEoC/YxGyohIISkJiIjEmJKAiEiMldTooChLSf/4x7fwyisv097ezjnn\nnMdHP3p8XuMQEQlDSSWB+vpVXPObuVTVVuflfC2NzVxz2hU5h52++OIy3n77LX7ykzo2b97E+eef\nrSQgIkUh9CRgZvOBI4EO4FJ3X5a2bSZwNrADWObu39zd96uqraZ6dGGf1D300MM46KCDAaipGcq2\nba0kEgnKysoKGoeISG+F2idgZscB+7n7NOBC4Na0bUOBbwFHu/txwEFmVpSzspSVlTFo0GAAFi16\niCOPPFoJQESKQtgdwzOAhwDc/XVghJnVpLZtB7YBw8ysEqgC8lf4JwJPP/0kjz66iG9+89+jDkVE\nJJCwm4PGAsvSlten1r3p7tvM7FpgBbAVuM/d3ww5ntA8//xS7rrrDubP/yFDhuSnT6K/664jvrm5\nucvy6tWrqK7e9fMI2tEuIoVR6I7hnW0kqeag2cB+wBbgj2b2QXd/taeDa2uHUFnZ8xfI5s35//Kt\nra1mzJjsfQxNTU0sWPBD7rjjDkaO3L35jQcO7OiyPGpUDcOH989qpMuXL3/fnM6J9u1d9pl/99OU\nVSRnXWtt2sCP5pzLpEmTuuzTnz5ztlj6+vsV5HdIJCphJ4G1JK/8O40D3km9PgBY7u6NAGb2NHAY\n0GMSaGzcmvXNGhubaWlszrpPb7Q0NtPY2My6dVuy7vfIIw+yYUMjM2d+Y2eH8JVXzmGPPfbs9Xtu\n2dLUZbmhoYnt2/vn4xyNjc3vm9O5Y0cr6Z+gaugoyisHdzkm8+fZnz5ztlga+/i7FeR3SCRM2S5C\nwk4Ci4FrgIVmNhVY4+6df0lvAweY2SB33wYcDvx2d95s/PgJXHPaFbtzim7PmctnPvM5PvOZz+X1\nfUVECiHUJODuS83sBTN7FmgHZprZucBGd3/YzL4HPGlmbcBz7v7s7ryfSkmLiPRO6H0C7j47Y9Wr\nadsWAgvDjkFERLrXPxubRUSkIJQERERiTElARCTGlARERGKspKqIRlVKetu2VubOncOGDQ20tbVx\n7rlfYdq0Y/Iah4hIGEoqCdTXr+IPV1/B6KqqvJxvfUsL0+fMzTns9JlnnmbKlAM566wv8e6773LZ\nZV9XEhCRolBSSQBgdFUVYwtcu2fGjBN3vv77399ljz3GZtl7l77U3gHV3xGR/Cm5JBCliy66gHXr\n1nHTTT8ItH99/ape1d6BZP2d6y49XQ/FiUheKAnk0W231fHGG39jzpyr+PnP7w10TG9r74iI5JNG\nB+WB++u8997fAZg8eX/a29vZuHFjxFGJiOSmJJAHL7/8IvfddzcAGzY00NrawogRIyKOSkQkt5Jr\nDlrf0lLwc5166ue58cbvMnPmP7N9+zZmzfp23mIQEQlTSSWB8eMnMH3O3LyfM5dBgwZx9dXX5fV9\nRUQKoaSSgEpJi4j0jvoESkRd3QLOPPOz1NUtiDoUESkiSgIloLW1hSVLHgNgyZLHaW3NX7+IiJQ2\nJYES0NbWRiKRACCR6KCtrS3iiESkWCgJiIjEmJKAiEiMKQmIiMSYkoCISIwpCYiIxJiSgIhIjCkJ\niIjEWEmVjRAJqqf5qLPN7LZmTX1BYhMpJCUBiaXuZnWD7DO7bXrvLUYfVbAQRQoicBIwsz2BianF\nle7+93BCEimMzFndIPvMbq1NG4AthQtQpAByJgEz+yfgcmAvYHVq9QQzWwPc4O4PhBifiIiEKGsS\nMLM7Uvuc5+4vZ2z7MPBvZvZJdz8vtAj7oK5uAYsXP8pJJ32CCy74l6jDERHpt3KNDnrQ3c/JTAAA\n7v6yu58DPBhOaH2jipoiIsHlag46JHXF3y13v9bdH85zTLulu4qagwdXRRyViEj/lCsJdG6fnPr3\nJ6AC+CjwlxDjEhGRAsi
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f61f588>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Survived', hue='Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems that SibSp is relevant for determining the survival rate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature ParCh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The feature Parch (Parents-Children Aboard) is somewhat related to the previous one, since it reflects family ties. It is well known that in emergencies, family groups often all die or evacuate together, so it is expected that it will also have an impact on our model."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Parch\n",
"0 678\n",
"1 118\n",
"2 80\n",
"3 5\n",
"4 4\n",
"5 5\n",
"6 1\n",
"dtype: int64"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Parch').size()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f575320>"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFUNJREFUeJzt3X+QXWd93/H3sotFtJbQSlkLWY4FSWa+jMMkKfGMGZki\nS47l0OCYIEMIiuJG0NjFtKoT6JiZWDYu7VASkwYzKR0Fg/GYtDhRCIqLkV2wsbBApCQQ2uYLsYOV\nlZxosVde/cDyerX94zyS765Xq7vSnnt2pfdrRqN7zj336rM7q/3c5zznR9fY2BiSJL2k6QCSpNnB\nQpAkARaCJKmwECRJgIUgSSosBEkSAD11vnlEbAQ2AGNAF/BzwOuB/wocBb6dmTeUbd8HXFPW35aZ\nX6gzmyRpvK5OnYcQEW8A3gr8FPDezPxmRNwDfBpI4F7gdUAf8AhwUWZ6koQkdUgndxltBv4z8MrM\n/GZZtw24AlgNfCEzRzPzB8D3gYs6mE2SznodKYSIuBjYDYwCQy1P7QOWAUuBwZb1g2W9JKlDOjVC\neBfwqfK4q2V914s3nXK9JKkmtU4qt7gMeE95vKRl/XJgD7AXePWE9XunesPnnx8d6+npnsGIknRW\nOOEH7toLISKWAQcy8/my/P8iYmVmPgq8Bfgo8D3gtyJiM3AecH5m/t+p3ndo6HDNySXpzNPfv+CE\nz3VihLCMaq7gmBuB/xYRXcDXM/NLABGxherooqPA9R3IJUlq0bHDTmfa4OCBuRlckhrU37/ghLuM\nPFNZkgRYCJKkolNHGdVudHSUgYHdTceY1AUXXEh3t0dESZrdzphCGBjYzZ7/fi/LFy1uOso4e/Y/\nDW9/KytWvKrpKJI0pTOmEACWL1rMiiX9TceQpDnJOQRJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSitpvoRkR\n64H3ASPAZuBvgLupyuhJYENmjpTtNgGjwJbMvLPubJKkF9Q6QoiIxVQlsBJ4E/Bm4DbgjsxcBTwG\nbIyI+cDNwBpgNXBjRCyqM5skaby6Rwg/DzyQmYeBw8B1EfE4cF15fhvwXuC7wK7MPAgQETuAS4H7\nas4nSSrqLoRXAr0R8efAIuADwPzMHCnP7wOWAUuBwZbXDZb1kqQOqbsQuoDFwC9TlcOXy7rW50/0\nuin19c2np6f7+PLwcC9DpxyzXn19vfT3L2g6hiRNqe5C+Cfg0cw8CjweEQeAkYiYl5lHgOXAHmAv\n40cEy4GdU73x0NDhCcuHZjL3jBoaOsTg4IGmY0jSlB9O6z7sdDuwJiK6ImIJcC7wIHBNeX4dcD+w\nC7g4IhZGxLlUk9CP1JxNktSi1kLIzL3AnwBfo5ogvgG4Bbg2Ih4G+oC7MvNZ4CaqAtkO3JqZfqSW\npA6q/TyEzNwCbJmweu0k220FttadR5I0Oc9UliQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEQE+dbx4Rq4B7ge8AXcC3gd8F7qYqoyeBDZk5EhHrgU3AKLAlM++sM5skabxO\njBAeysw1mbk6MzcBtwF3ZOYq4DFgY0TMB24G1gCrgRsjYlEHskmSik4UQteE5cuAbeXxNuAK4BJg\nV2YezMxngR3ApR3IJkkqat1lVFwUEZ8DFlONDuZn5kh5bh+wDFgKDLa8ZrCslyR1SN2F8D3g1sy8\nNyJ+HPjyhH9z4ujhZOuP6+ubT09P9/Hl4eFehk4naY36+nrp71/QdAxJmlKthZCZe6kmlcnMxyPi\nH4GLI2JeZh4BlgN7gL2MHxEsB3ZO9d5DQ4cnLB+aweQza2joEIODB5qOIUlTfjitdQ4hIt4REb9d\nHr+CatfQJ4FryibrgPuBXVRFsTAizgVWAo/UmU2SNF7du4w+D3wmIq4GXgpcB3wL+HRE/CbwBHBX\nZo5GxE3AduAo1W4mP1JLUgfVvcvoIPBLkzy1dpJttwJb68wjSToxz1SWJAEWgiSpsBAkSYCFIEkq\nLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQB\nFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRAT93/QES8DPgOcBvwJeBuqiJ6EtiQmSMRsR7YBIwC\nWzLzzrpzSZLG68QI4WbgqfL4NuCOzFwFPAZsjIj5ZZs1wGrgxohY1IFckqQWtRZCRATwauA+oAtY\nBWwrT28DrgAuAXZl5sHMfBbYAVxaZy5J0ovVPUK4HfgtqjIA6M3MkfJ4H7AMWAoMtrxmsKyXJHVQ\nbXMIEbEBeDQzn6gGCi/SNdnKKdaP09c3n56e7uPLw8O9DE07ZWf09fXS37+g6RiSNKU6J5V/EXhV\nRFwFLAeeAw5GxLzMPFLW7QH2Mn5EsBzYebI3Hxo6PGH50AzFnnlDQ4cYHDzQdAxJmvLDaW2FkJlv\nP/Y4IjYD3wdWAtcA9wDrgPuBXcAfRcRC4GjZZlNduSRJk+vUeQjHdgPdAlwbEQ8DfcBdZSL5JmB7\n+XNrZvpxWpI6rPbzEAAy8wMti2sneX4rsLUTWSRJk2trhBARn5pk3RdnPI0kqTFTjhDKGcTXA6+J\niK+0PHUO1eGikqQzxJSFkJn3RMRDVJPAt7Q8dRT4PzXmkiR12EnnEDJzD3BZRLwcWMwLE8SLgKdr\nzCZJ6qC2JpUj4g+AjVRnER8rhDHgx2vKJUnqsHaPMloD9JdDRCVJZ6B2z0P4nmUgSWe2dkcIA+Uo\nox3A88dWZubmWlJJkjqu3UJ4CvhfdQaRJDWr3UL4D7WmkCQ1rt1CeJ7qqKJjxoBngCUznkiS1Ii2\nCiEzj08+R8Q5wOXAz9QVSpLUedO+2mlmPpeZX6C6/aUk6QzR7olpGyes+jGqG9lIks4Q7c4h/POW\nx2PAMPC2mY8jSWpKu3MIvwEQEYuBscycrbcvliSdonZ3Ga0E7gYWAF0R8RTwa5n5l3WGkyR1TruT\nyh8Crs7M8zKzH/hV4CP1xZIkdVq7hTCamd85tpCZf0XLJSwkSXNfu5PKRyNiHfBAWf4FYLSeSJKk\nJrRbCNcDdwB/RHW3tL8G/lVdoSRJndfuLqO1wJHM7MvMJVQ3yfkX9cWSJHVau4Xwa8BbWpbXAu+Y\n+TiSpKa0u8uoOzNb5wzGeOFWmicUET8CfApYCswDPgh8i+oQ1pcATwIbMnMkItYDm6jmJrZk5p3t\nfhGSpNPXbiF8PiIeBR6h+kV+OfCnbbzuKuAbmfl7EXEh1aT0V4GPZeafRsR/BDZGxN3AzcDFVEcv\nfSMitmbm/ml+PZKkU9TumcofjIiHgEuoRgfvzsyvtfG6z7YsXgj8A7AKuK6s2wa8F/gusCszDwJE\nxA7gUuC+9r4MSdLpaneEQGbuoLqF5rRFxFepLoZ3FfBAZo6Up/YBy6h2KQ22vGSwrJckdUjbhXA6\nMvPSiPhp4B7Gzz2caB7ipPMTfX3z6enpPr48PNzLbL3AUl9fL/39C5qOIUlTqrUQIuK1wL7MHMjM\nb0dEN3AgIuZl5hGqUcMeYC/jRwTLgZ1TvffQ0OEJy4dmNPtMGho6xODggaZjSNKUH06nfYOcaXoD\n8NsAEbEUOBd4ELimPL8OuB/YBVwcEQsj4lxgJdUEtiSpQ+ouhI8D50XEV6gmkP81cAtwbUQ8DPQB\nd2Xms8BNwPby59bM9CO1JHVQrbuMyi/69ZM8tXaSbbcCW+vMI0k6sbpHCJKkOcJCkCQBFoIkqbAQ\nJEmAhSBJKiwESRJgIUiSCgtBkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgI\nkqTCQpAkARaCJKmwECR
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f57c550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Parch', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most of the passenger had any parent or children.\n",
"\n",
"We analyze now the relationship with Survived."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Parch\n",
"0 0.343658\n",
"1 0.550847\n",
"2 0.500000\n",
"3 0.600000\n",
"4 0.000000\n",
"5 0.200000\n",
"6 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Parch').Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f53e6d8>"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAESCAYAAAACDEUqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X901PWd7/FnQkIIJJAAEwLE8CvyTgAVQSgFxR+0rlQR\n67qut71W27La1u2pPffcvbeebre9e09v7/ZKrXtOt9btat1qq9ZW8VfFir9AVH4oAiYfQH6HQBIg\n/P6duX/MYKeRSSbJzHx/zOtxjsdkvl+Y9/eEvOYzr/nOd/Ki0SgiIhIO+V4PICIi6aNQFxEJEYW6\niEiIKNRFREJEoS4iEiIKdRGREClIZSczWwjMANqBu51zKxO2VQG/AQqB1c65b2RiUBER6VqXK3Uz\nmw3UOOdmAguA+zvsci/wY+fcDOBMPORFRMQDqdQvc4CnAZxzDUCZmZUAmFkecCnwbHz7N51zOzM0\nq4iIdCGVUK8EWhK+b43fBhABDgP3mdmbZvbDNM8nIiLd0JMXSvM6fD0S+AlwOXCxmc1Nx2AiItJ9\nqbxQuos/r8wBRgBN8a9bga3Oua0AZvYKMBF4MdlfFo1Go3l5eck2i2RNNBpl8Tvb+eWidRw7cRqA\nyeMj/POdMz2eTOScUgrOVEJ9MfB94EEzmwI0OueOADjnzpjZZjMb55z7CJgKPNbpVHl5tLQcSmW2\nQIpESnV8AbD3wHEe/mMD67fso7ioD1+eW8tLK3ZQv3UfTbsPUNAnfGf7huVnl0wuHF8qugx159xy\nM1tlZsuAM8BdZnYb0Oacewb4NvBw/EXTtc65Z3sxt0hGRaNR3vygid++spHjJ88waexgbr+mlsED\n+7FtzyGWrG5ka9MhaqoGeT2qSI+kdJ66c+6eDjetTdj2EXBZOocSyYRzrc4vvXA4Z+vA2upylqxu\npGH7foW6BFZKoS4SZJ2tzhONP68MALd9P9fNHO3BpCK9p1CXUOtqdZ5o4IC+VFeWsrHxAKfPtIey\nV5fwU6hLKKW6Ou/ognFD2b77kHp1CSyFuoROd1bnHV0wbijPL9uiXl0CS6EuodHT1XmiiWOHAOrV\nJbgU6hIKvVmdJyorLWLk0AHq1SWwFOoSaOlYnXdk1WU0rj6iXl0CSaEugZWu1XlHOl9dgkyhLoGT\nidV5Ip2vLkGmUJdAydTqPNHAAX3Vq0tgKdQlEDK9Ou9IvboElUJdfC8bq/OO1KtLUCnUxbeyvTpP\npF5dgkqhnianz7Tz0yfXsO/QSSbXDGF63TCqh5VkdDUZZl6szhOpV5egUqinydNvbmH91v3k5cGL\n7xzhxXe2U1FezLTaCqbXDaMqMkABnwIvV+cdqVeXIFKop8GHW/fx4tvbiJT1Y+HdV/D2mkZWNOzh\n/U2tPL98G88v38bwIf2ZVlvBtLphjBw6wOuRfcnr1XlH6tUliBTqvXTw6EkefO5D8vPzuPP6SZSV\nFjHVIky1CCdOnmHNR62saGjmg4/2smjZVhYt28rIyACmx1fwwwb39/oQPOen1Xki9eoSRAr1XohG\nozz8QgMHDp/kpivGMXbEwL/YXtS3D9PrhjG9bhjHTpxmzaZW3q1vZt2WvfzhzS384c0tVA8r+bii\niZQVe3Qk3vHb6jyRenUJIoV6LyxZ3cj7m1qpG1XONZ+q7nTf4qICZkysZMbESo4eP8V7G2Mr+PVb\n9rF9z2aeen0zY4aXMq12GNPrKjxfpWaaX1fnHalXl6BRqPfQjubDPL5kEyXFhSy4bgL53VhZ9u9X\nyKwLhjPrguEcPnaK1RtaWNHQTP3W/WxpOsQTr25i3MiBTK8dxiW1FZSXFmXwSLLPz6vzjtSrS9Ao\n1HvgxKkz/PyZdZw+085Xrp3Uq9AtKS5k9kUjmH3RCA4ePclq18K79XtwO9r4qPEgv31lI+efV8b0\nugqmWgWDBvRN45FkV1BW54nUq0vQKNR74PFXNtK09yhzplYxuWZo2v7egf37csXFI7ni4pEcOHyC\nla6FFfV72LijjQ072nj05Q3UVpczra6CqeMjlPYPTsAHaXWeSL26BI1CvZtWuWZee38XVZESbr5y\nXMbuZ1BJEXOmVjFnahX7D51gRUMzK+r3UL9tP/Xb9vPrlzZQN7qc6bUVTLEIA/oVZmyW3gji6rwj\n9eoSJAr1bth74DgPvdBA34J8vjZ/IoUFfbJyv+WlRVw97TyunnYerQeOsbIhVtGs37KP9Vv28chL\njoljBjO9roLJNRH69/PHjzWoq/OO1KtLkPjjtz8A2tujPPjseo6eOM2XrjFGePQGoqGDirnmU9Vc\n86lqmtuOsaJ+DyvqY+fBf/DRXgr6OC4YO5hpdRVMrhlKv77Z/xGHYXWeSL26BIlCPUXPvbWVDTsP\nMNUiXH7RCK/HAaCirJhrPz2aaz89mt37jvJu/R5WNDTz3sZW3tvYSt+CfC4cN4RpdcO4cNwQigoz\n/8wiLKvzROrVJUgU6inYuLONZ5ZtYfDAIm6fW+vLgKoc3J/rZ43h+lljaGw5zIqGZt6tb2ala2Gl\na6GosA8XxS80dsHYwWmvjsK2Ou9IvboEhUK9C0eOn+IXi9YDcMe8ib59QTLRyEgJIyMlzL90DDua\nD8dfZI2F/Lv1zfTr24eLzx/KtLphTBozuNcrzzCuzjtSry5BkVKom9lCYAbQDtztnFuZsG0LsD2+\nLQp80TnXlIFZsy4ajfKrPzr2HjzB9bNGf9ytBkVeXh7Vw0qpHlbKjbPHsm3PId6tjwX88vV7WL5+\nD/2LCrh4/FCm1w2jblR5twI+7KvzROrVJSi6DHUzmw3UOOdmmlkt8B/AzIRdosA1zrljGZrRM29+\n0MTKhmZqqgYxb9Zor8fplby8PEZXDmR05UD+5opxbG46yIr6ZlY0NLNs7W6Wrd1NSXEhU8ZHmF5X\ngVWX0Sc/ecDnwuo8kXp1CYpUVupzgKcBnHMNZlZmZiXOucPx7Xnx/0Klae8RHvvTBoqLCrhj3oRO\nAy5o8vLyGDdiEONGDOLmq2rYtPMAKxqaWdnQzBtrdvHGml0M7F/IVKtgel0F51eVkZ8f+xFHo1He\nWLMrJ1bnHalXlyBIJdQrgZUJ37fGb9uUcNvPzWwM8KZz7p40zueJU6fbeeCZ9Zw81c7Xb5jA0EHh\nvXpifl4e488rY/x5ZfyXOeezYUcb7zY0s8o18+p7jbz6XiODSvoyzSqYNHYIr/9+Le9taMmJ1XlH\n6tUlCHryQmnH3+B/BP4I7AOeMbMbnXO/7/VkHvrdax+xvfkwsy8azrTaCq/HyZr8/DxqR5VTO6qc\nL372fBq2t7Gifg+rXAt/WrWTP63aCZBTq/NE6tUlCFIJ9V3EVuZnjQA+fiHUOffrs1+b2QvABUCn\noR6JlHZvyixaWb+Hl1fuoKqihG/+7RT6FXX/cc/Px9cdlcMGccW0UZw+086ajS2samimpqqMK6dW\nhXp1nuznF4lAdWUpm3YdpKx8AIUFwavkwvJvM5mwH18qUkmsxcD3gQfNbArQ6Jw7AmBmA4EngHnO\nuVPA5cCTXf2FLS2HejxwJrUdPsHCx1ZR0CePBdfWcejgMbo7aSRS6tvj643qIf2pnjU6tMd3VlfH\nVzNiINt3H2Ll2l2Bq2By/WcXdKk+YHW51HDOLQdWmdky4D7gLjO7zczmO+cOAs8Db5vZm0Czc+6p\nXsztmfZolF8+9yGHjp7ib66ooXqYHvHlk2qrywFo2L7f40lEzi2lbuEcL36uTdj2r8C/pnMoL7z0\n7nbWb93PheOG8JlLqrweR3xKvbr4XfBKwQzY0nSQ37++mUED+vKVz9WFui+W3ul4vrqI3+R8qB87\ncZoHFq3nTHuUBddNYGCAP1lIssOqyzh5qp2tTeHtbyW4cj7UH315A837jzH3U9VMHDPY63EkANSr\ni5/ldKi/vX43b63bzejKUj4/e6zX40hAJPbqIn6Ts6He3HaMR15yFPXtw53zJ+paHpIy9eriZzmZ\nZKfPxC4DcPzkGW69ejz
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f4a3278>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Probability survival\n",
"df.groupby('Parch').Survived.mean().plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the probability of surviving is higher in 2 and 3. Sincethere were too few rows for Parch >= 3, this part is not relevant."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f4fbe10>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f3c1240>], dtype=object)"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEBCAYAAAB4wNK4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFnFJREFUeJzt3X+QXWWd5/F3g0KbkF4c5hqyTcE4pfu1LGtmFtlhN5ki\nIQiupchuAeXWZFklYy0zisUAOhtrJxKzU7uODJSrDuVsMAoUbgFTPWgMYHR0mYlmNliOMPzhdzUy\nwSRILkjoJMUlsdP7xzkZ2+aGPt333B9Nv19VXbn33Ofc79P33qc/Off8eIYmJyeRJOmkfndAkjQY\nDARJEmAgSJJKBoIkCTAQJEklA0GSBMCr+t0BvbyIWA3cDCwGdgNXZ+a+/vZK6o+IeBXwp8D1wFmO\nhXq5hTDAImIR8L+BtZn5JuCrwF/0t1dSX30ZGAc8gaoLDITBthrYlZmPlvc3A5dExOI+9knqp42Z\n+XFgqN8deSUyEAbbvwB2Hb+TmYeBZ4E39K1HUh9l5v/tdx9eyQyEwbYIaE1b9gLF/gRJqpWBMNgO\nA8PTli0CDvWhL5Je4QyEwfYD4I3H70TEPwNOB37Ytx5JesUyEAbbt4CzI2J5ef964KuZ+UIf+yTp\nFWrIy18Ptoi4APg0xVdFPwLel5n7+9srqfci4nXAw+Xd4wdc/By4KDOf6lvHXkEqBUJEDAOPAxuB\nC4G3As+UD9+cmQ9GxBrgOmAC2JSZm7vTZUlSN1Q9U3k9xeGOUJwQsi4zHzj+YHkC1XrgPIrEfiQi\nxjLzQJ2dlSR1z4z7ECIigDcBWylOBjn+M9X5wM7MPJSZLWA7sKLmvkqSuqjKFsItwAeB9/GL08U/\nGBE3AE8DHwLOBJpT1mkCy+rrpiSp2152CyEirgK+k5m7y0VDwJ0UXxldBHwf2NBmVU8rl6R5ZqYt\nhHcCr4+IS4GzKM6avSYzHysf3wLcBtwHXDplvVFgx0zFJycnJ4eGzA7Vbt59qBwL6qLKH6zKh51G\nxMeAf6T4w/9HmflERPwB8GbgI8BjFDuVjwHfBf5VZh6c4Wknm82ZmnSm0ViCNRZcjfn4l7XrY6Gd\nXrwfg1Z7AdatPB7mMh/CZ4F7IuIwxSUUrs7MVkSsA7ZRBMKGCmHATZ/4HBOTr551B44dm+Cyi/8N\nv/76X5v1upKk9ioHQmZunHL3t9s8PgaMzab4d398hJOWzH7f88TRFr/1kz0GgiTVyEtXSJIAA0GS\nVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElSyUCQ\nJAEV50OIiGHgcWAj8E3gLooweQq4KjOPRsQa4DpgAtiUmZu702VJUjdU3UJYDzxb3t4IfCYzVwK7\ngLURsahssxq4ELg+Ik6vu7OSpO6ZMRAiIoA3AVspJmteCWwpH94CXAycD+zMzEOZ2QK2Ayu60mNJ\nUldU2UK4BbiBIgwAFmfm0fL2fmAZsBRoTlmnWS6XJM0TLxsIEXEV8J3M3H2CJkOzXC5JGlAz7VR+\nJ/D6iLgUGAWOAIci4tTMfLFcthfYxy9vEYwCO7rQ338yMjJMo7GkUtuq7TphjcGqMR/163Xp5/ux\n0H7nQf/sv2wgZOZ/OH47Ij4G/COwHLgCuBu4HHgI2AncHhEjwLGyzXXd6XJhfLxFs3lwxnaNxpJK\n7TphjcGrMR91+3Vppxfvx6DVXoh1q5rNeQjHvwa6CXhvRDwMvBa4o9yRvA7YVv5syMz+fMokSXNS\n6TwEgMz8+JS7l7R5fAwYq6NTkqTe80xlSRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElS\nyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJKDCfAgR8Rrgi8BS4FTgTyhmTHsr8EzZ7ObM\nfDAi1lDMlDYBbMrMzd3otCSpflUmyLkUeCQz/ywizga+DnwbWJeZDxxvFBGLgPXAecDPgUciYiwz\nD3Sh35Kkms0YCJl575S7ZwM/KW8PTWt6PrAzMw8BRMR2YAWwtYZ+SpK6rPIUmhHxbWAUeBdwI/DB\niLgBeBr4EHAm0JyyShNYVl9XJUndVHmncmauAN4N3A3cSfGV0UXA94ENbVaZvgUhSRpgVXYqnwvs\nz8w9mflYRLwK+IfMPL5DeQtwG3Afxf6G40aBHXV3+LiRkWEajSWV2lZt1wlrDFaN+ahfr0s/34+F\n9jsP+me/yldGFwDnANdHxFLgNOAvIuLDmfkEsAp4HNgJ3B4RI8AxYDnFEUddMT7eotk8OGO7RmNJ\npXadsMbg1ZiPuv26tNOL92PQai/EulVVCYTPAZ+PiL8BhoEPAIeAeyLicHn76sxsRcQ6YBtFIGzI\nzP580iRJs1blKKMWsKbNQ7/dpu0YMFZDvyRJPeaZypIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUD\nQZIEGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIAA0GSVDIQJElAtSk0XwN8EVgKnAr8CfAocBdF\noDwFXJWZRyNiDcUsaRPApszc3KV+S5JqVmUL4VLgkcxcBbwHuBXYCHw2M1cCu4C1EbEIWA+sBi6k\nmHLz9K70WpJUuyozpt075e7ZwE+AlcA15bItwIeB/wfszMxDABGxHVgBbK2zw5Kk7qgypzIAEfFt\nYJRii+HrmXm0fGg/sIziK6XmlFWa5XJJ0jxQeadyZq4A3g3cDQxNeWio/RonXC5JGkBVdiqfC+zP\nzD2Z+VhEnAwcjIhTM/NFiq2GvcA+fnmLYBTY0Y1OA4yMDNNoLKnUtmq7TlhjsGrMR/16Xfr5fiy0\n33nQP/tVvjK6ADiHYifxUuA04EHgCoqthcuBh4CdwO0RMQIcA5ZTHHHUFePjLZrNgzO2azSWVGrX\nCWsMXo35qNuvSzu9eD8GrfZCrFtVla+MPge8LiL+hmIH8h8ANwHvjYiHgdcCd2RmC1gHbCt/NmRm\nfz5pkqRZq3KUUQtY0+ahS9q0HQPGauiXJKnHPFNZkgQYCJKkkoEgSQIMBElSyUCQJAEGgiSpZCBI\nkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVDARJEmAgSJJKBoIkCag2YxoR8Ungd4CTgU9QzK38VuCZ\nssnNmflgRKyhmCVtAtiUmZvr77IkqRuqzKm8CnhzZi6PiF8B/h74a2BdZj4wpd0iYD1wHvBz4JGI\nGMvMA13puSSpVlW+MnoYuLK8fQBYTLGlMDSt3fnAzsw8VM6yth1YUVdHJUndVWUKzUnghfLu+4Gt\nFF8JXRsRNwBPAx8CzgSaU1ZtAstq7a0kqWsq71SOiMuAq4FrgbuA/5KZFwHfBza0WWX6FoQkaYBV\n3an8duCjwNsz8yDwrSkPbwFuA+4DLp2yfBTYUVM/X2JkZJhGY0mltlXbdcIag1VjPurX69LP92Oh\n/c6D/tmvslN5BPgkcFFmPl8u+0vgI5n5BLAKeBzYCdxetj8GLKc44qgrxsdbNJsHZ2zXaCyp1K4T\n1hi8GvNRt1+Xdnrxfgxa7YVYt6oqWwjvAc4A7o2IIWAS+AJwT0QcBg4BV2dmKyLWAdsoAmFDuTUh\nSZoHquxU3gRsavPQXW3ajgFjNfRLktRjnqksSQIMBElSyUCQJAEGgiSpVOk8BEndteFPP8eTzaNz\nWveNo6fx/v94Rc090kJkIEgD4PnWSTQnz57TusteaM7cSKrAr4wkSYCBIEkqGQiSJMBAkCSVDARJ\nEmAgSJJKBoIkCTAQJEklA0GSBFSfQvOTwO8AJwOfAB6hmA/hJOAp4KrMPBoRayhmSZsANmXm5q70\nWpJUuxm3ECJiFfDmzFwOvAP4FLAR+GxmrgR2AWsjYhGwHlgNXAhcHxGnd6vjkqR6VfnK6GHgyvL2\nAWAxsBL4SrlsC3AxcD6wMzMPZWYL2A6sqLe7kqRuqTKF5iTwQnn394CtwNsz8/ilGfcDy4ClwNSr\nbDXL5ZKkeaDy1U4j4jJ
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f47c390>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(column='Parch', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Parch</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">1</th>\n",
" <th rowspan=\"3\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.984375</td>\n",
" <td>0.484375</td>\n",
" <td>64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.000000</td>\n",
" <td>0.411765</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.846154</td>\n",
" <td>1.076923</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.363636</td>\n",
" <td>0.262626</td>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.285714</td>\n",
" <td>0.357143</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.625000</td>\n",
" <td>0.750000</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">2</th>\n",
" <th rowspan=\"4\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.888889</td>\n",
" <td>0.333333</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.944444</td>\n",
" <td>0.722222</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.000000</td>\n",
" <td>0.545455</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.000000</td>\n",
" <td>1.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.089888</td>\n",
" <td>0.224719</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.500000</td>\n",
" <td>1.071429</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.400000</td>\n",
" <td>0.400000</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"13\" valign=\"top\">3</th>\n",
" <th rowspan=\"7\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.588235</td>\n",
" <td>0.341176</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.480000</td>\n",
" <td>1.240000</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.320000</td>\n",
" <td>2.560000</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.500000</td>\n",
" <td>0.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>0.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.250000</td>\n",
" <td>0.500000</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"6\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.121622</td>\n",
" <td>0.135135</td>\n",
" <td>296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.266667</td>\n",
" <td>1.900000</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.166667</td>\n",
" <td>4.055556</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived SibSp Parch\n",
"Pclass Sex Parch \n",
"1 female 0 0.984375 0.484375 64\n",
" 1 1.000000 0.411765 17\n",
" 2 0.846154 1.076923 13\n",
" male 0 0.363636 0.262626 99\n",
" 1 0.285714 0.357143 14\n",
" 2 0.625000 0.750000 8\n",
" 4 0.000000 1.000000 1\n",
"2 female 0 0.888889 0.333333 45\n",
" 1 0.944444 0.722222 18\n",
" 2 1.000000 0.545455 11\n",
" 3 1.000000 1.500000 2\n",
" male 0 0.089888 0.224719 89\n",
" 1 0.500000 1.071429 14\n",
" 2 0.400000 0.400000 5\n",
"3 female 0 0.588235 0.341176 85\n",
" 1 0.480000 1.240000 25\n",
" 2 0.320000 2.560000 25\n",
" 3 0.500000 0.500000 2\n",
" 4 0.000000 0.500000 2\n",
" 5 0.250000 0.500000 4\n",
" 6 0.000000 1.000000 1\n",
" male 0 0.121622 0.135135 296\n",
" 1 0.266667 1.900000 30\n",
" 2 0.166667 4.055556 18\n",
" 3 0.000000 1.000000 1\n",
" 4 0.000000 1.000000 1\n",
" 5 0.000000 1.000000 1"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Pclass', 'Sex', 'Parch'])['Parch', 'SibSp', 'Survived'].agg({'Parch': np.size, 'SibSp': np.mean, 'Survived': np.mean})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that Parch has an important impact for men in first and second class. We are going to check the age."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Survived 0.439024\n",
"Age 27.871951\n",
"dtype: float64"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('(Sex == \"male\") and (Pclass == [1, 2]) and (Parch == [1, 2])')[['Survived', 'Age']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that in those cases, the age is 27. We can compare with the rest of men if first and second class."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Survived 0.269565\n",
"Age 36.063750\n",
"dtype: float64"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('(Sex == \"male\") and (Pclass == [1, 2])')[['Survived', 'Age']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that there is a significant difference, so we suspect that this feature has impact of men in first and second class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recap: Filling null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Age: null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We fill null values of Age with its median."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 29.361582\n",
"std 13.019697\n",
"min 0.420000\n",
"25% 22.000000\n",
"50% 28.000000\n",
"75% 35.000000\n",
"max 80.000000\n",
"Name: AgeFilled, dtype: float64"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We create a new feature to maintain the original \n",
"df['AgeFilled'] = df['Age'].fillna(df['Age'].median())\n",
"df['AgeFilled'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f360ba8>"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEDCAYAAAD6CoU1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEXpJREFUeJzt3X+QXWV9x/H3koVgku2wMLcxbmu0aL+UdjpDtGiBkpCG\npE5FWxIGgUnR6Mi06NRWLfqHoqnTiTr4Axx/DkhIBfxBS92xQxACOiNWaaGibfM1RhuURLLa1S5G\n0g3Z/nFvdDfJ3h+bPbn7cN+vGebcnHPuOd87A588PM95ntM3MTGBJKksJ3S7AElS5wxvSSqQ4S1J\nBTK8JalAhrckFcjwlqQC9bc6ISIWArcAg8BJwEbgh8BHgIPAI5l5dZVFSpKmaqfl/Upge2auBNYB\nHwTeD7w+M/8AOCUi1lRXoiTpcO2E94+A0xqfTwN+DDw3Mx9q7BsGVlVQmyRpGi3DOzM/DSyNiB3A\n/cCbgdFJp+wFllRSnSTpqFqGd0RcAezKzOcDK4G/P+yUvioKkyRNr+WAJXAusBUgM78ZEc847HtD\nwO5mFzhw4KmJ/v55My5SqsrF1wwzfuAgJ/WfwB3vvqjb5UiHm7Zx3E54fwd4MfCPEbEUGAO+FxHn\nZuZXgIuB65tdYHR0Xwe1SsfPBWcNcf/Dj7HirCFGRsa6XY40Ra02MO2xvlarCjYeFbwJWAzMA95G\n/VHBj1P/W+FrmfmmZtcYGRlz6ULNWbXagMGtOalWG5i25d0yvGeD4a25zPDWXNUsvJ1hKUkFMrwl\nqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAhrckFcjwVk+7/d4dXHzNMLff\nu6PbpUgdMbzV07Y99BjjBw5y38OPdbsUqSOGt3raymVDnNR/AhecNdTtUqSOuCSsep5LwmqucklY\nSXqaMbwlqUCGtyQVyPCWpAK1fHt8RGwA1gMT1F84/ALgPOAjwEHgkcy8usoiJUlTdfS0SUScD1wC\n/Dbwpsx8KCI+BdySmVun+55Pm2iueucnv86ux59g6eJFXPuqs7tdjjTFbD5t8nbg3cBzMvOhxr5h\nYNUMa5O6atfjT0zZSqVoO7wj4oXAo8BTwOikQ3uBJbNclySpiU5a3q8Bbm58ntyUn7ZZL811AwtO\nnLKVStFywHKSFcDrGp9Pm7R/CNjd7IuDgwvo75/XWWXScTC2b/wX21ptoMvVSO1rK7wjYgkwlpkH\nGn/+r4g4JzMfAC4Grm/2/dHRfcdcqFSFpYsX/WLA0inymmuaNSjabXkvod63fchfAR+LiD7ga5m5\nbeblSd1z7avOdm0TFcmFqdTzDG/NVS5MJUlPM4a3JBXI8JakAhneklQgw1uSCmR4S1KBDG/1tNvv\n3cHF1wxz+707ul2K1BHDWz1t20OPMX7gIPc9/Fi3S5E6Ynirp/W5rJoKZXirp40fODhlK5XC8FZP\nc0lYlcrwVk87dWD+lK1UCsNbPc3XoKlUhrd62tLFi6ZspVIY3pJUIMNbPc1uE5XK8FZPs9tEpTK8\n1dN2/3jflK1UCsNbPc1JOipVu2+PvwJ4MzAOvB34JrCFevjvAdZn5nhVRUpVObH/BMYPHOTEftsx\nKkvLf2Mj4lTqgX0O8FLgT4CNwA2ZuRzYCWyoskipKiefNG/KVipFO82NVcAXM3NfZj6emVcBK4Dh\nxvHhxjlSccb2jU/ZSqVop9vkOcDCiPgn4BTgncCCSd0ke4El1ZQnSTqadsK7DzgV+FPqQX5fY9/k\n400NDi6gv9//LdXcVqsNdLsEqW3thPfjwAOZeRD4bkSMAeMRMT8z9wNDwO5mFxgd9TEszU2TByxH\nRsa6XY40RbMGRTt93ncDKyOiLyJOAxYB9wDrGsfXAncda5FSNzzrtAVTtlIpWoZ3Zu4GPgf8C/AF\n4GrgWuDKiPgSMAhsrrJIqSpOj1ep2nrOOzM/AXzisN2rZ78c6fhaungRux5/wunxKo4zE9TTvr/3\niSlbqRSGt3rawYmpW6kUhrd62gl9U7dSKQxv9bRf/9VFU7ZSKQxv9TSfNlGpDG/1NF/GoFIZ3upp\nvoxBpTK81dN8GYNKZXhLUoEMb0kqkOGtnuaApUrVNzFR/dSykZEx569pzqrVBlwOVnNSrTYw7fSx\nthamkkqxdu1F7Ny5o9J7nH7687njjuHWJ0oVsuWtnrdh0zZuesvKbpchHaFZy9s+b0kqkOEtSQUy\nvCWpQIa3et5lq6PbJUgdM7zV8y5fc0a3S5A61vJRwYhYDnwW+BbQBzwCvBfYQj389wDrM3O8wjol\nSZO02/K+PzNXZuYFmfmXwEbghsxcDuwENlRWoSTpCO2G9+HPGq4ADs1SGAZWzVZBkqTW2p1heWZE\n3AmcSr3VvWBSN8leYEkVxUmSjq6d8N4BvCMzPxsRvwHcd9j3Wr66dXBwAf3982ZYolStW7dud9BS\nxWkZ3pm5m/qAJZn53Yj4IfDCiJifmfuBIWB3s2uMjvqWEs1dt92dXLhsqNtlSEeo1QamPdayzzsi\nLo+INzY+PxNYDHwSWNc4ZS1w17GXKUlqVzvdJp8Hbo2IlwMnAlcB3wBuiYjXAruAzdWVKEk6XDvd\nJk8ALzvKodWzX44kqR3OsJSkAhne6nmubaISGd7qeT4mqBIZ3pJUIMNbkgpkeEtSgQxvSSqQ4a2e\nd+vW7d0uQeqY4a2ed9vd2e0SpI4Z3pJUIMNbkgpkeEtSgQxvSSqQ4a2e59omKpHhrZ7n2iYqkeEt\nSQUyvCWpQIa3JBXI8JakArXzAmIi4mTgW8BGYBuwhXrw7wHWZ+Z4ZRVKFbt163YuXDbU7TKkjrTb\n8n4b8OPG543ADZm5HNgJbKiiMOl4cW0TlahleEdEAGcAXwD6gOXAcOPwMLCqsuokSUfVTsv7OuCv\nqQc3wMJJ3SR7gSVVFCZJml7TPu+IWA88kJm76g3wI/QdbefhBgcX0N8/bwblScdHrTbQ7RKkjrQa\nsPxj4LkRcREwBPwf8EREzM/M/Y19u1vdZHR03zEXKlVpZGSs2yVIR2jWqGga3pn5ikOfI+LtwH8D\n5wDrgE8Ba4G7ZqNIqVtc20Ql6uQ570NdJNcCV0bEl4BBYPOsVyUdR65tohL1TUxMVH6TkZGx6m8i\nzVCtNmC3ieakWm1g2nFFZ1hKUoEMb0kqkOEtSQUyvNXzbt26vdslSB0zvNXzXNtEJTK8JalAhrck\nFcjwlqQCGd6SVCBnWGpOe/0HvszPnjzQ7TKO2cKT+7nhDed3uwwVptkMy7ZegyZ1y8+ePMBNb1lZ\n6T2Ox/T4DZu2VXp99R67TSSpQIa3JBXI8JakAhneklQgw1uSCmR4S1KBDG9JKpDhLUkFajlJJyKe\nAdwMLAbmA+8CvgFsoR7+e4D1mTleXZmSpMnaaXlfBDyYmSuAS4H3ARuBD2XmcmAnsKGyCiVJR2jZ\n8s7Mz0z647OB7wPLgasa+4aBNwIfm/XqJElH1fbaJhHxFWCIekv8i5O6SfYCSyqoTZI0jbbDOzPP\njYjfBT4FTF7patpVrw4ZHFxAf/+8GZQn1ReO8h7SVO0MWC4D9mbmDzLzkYiYB4xFxPzM3E+9Nb67\n2TVGR/fNTrXqSVWv+Hc8VhWE6n+Hnn6a/YXfzoDl+dT7tImIxcAi4B5gXeP4WuCuYytRktSJdrpN\nPgrcGBFfBk4G/hz4N2BLRLwW2AVsrq5ESdLh2nna5EngiqMcWj375UiS2uEMS0kqkK9B05z26kc/\nz7dfc0ul9/h2pVeve/VJpwDVvs5NvcXw1px247Nf9rR4h+WmTds4t9I7qNfYbSJJBTK8JalAhrck\nFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCuSSsJrz\nNmza1u0SjtnCk/1PTbOrb2JiovKbjIyMVX8TaYY2bNpW+Zrh0kzUagN90x1rqzkQEe8BzgPmAZuA\nB4Et1Ltd9gDrM3P82Eu
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f9461d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Bug: if you include Seaborn, add 'sym='k.' to show the outliers\n",
"df.boxplot(column='AgeFilled', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another alternative is to use the function interpolate()."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 29.726061\n",
"std 13.902353\n",
"min 0.420000\n",
"25% 21.000000\n",
"50% 28.500000\n",
"75% 38.000000\n",
"max 80.000000\n",
"Name: AgeFilled, dtype: float64"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['AgeFilled'] = df['Age'].interpolate()\n",
"df['AgeFilled'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f2c04a8>"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEDCAYAAAD6CoU1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEQ9JREFUeJzt3X+QXWV9x/H3sivBhO2wMHdi3I7Rov1S2ukM0aIDlIQ0\nkHYq2pJYJEyqpFamFae0atE/lBptB3W0Fhx/dVCQiqhNS7tjh/AjIDPiIBYq2jbfxmhjTSJZ7WJX\nozHJbv+4N7rZZO+PzZ69++S+XzPMvXvuj/PZmcxnH55zznP6JicnkSSV5ZRuB5Akdc7ylqQCWd6S\nVCDLW5IKZHlLUoEsb0kq0ECrN0TEEuATwBBwKrAZ+A7wIWACeDIzX1dlSEnS0doZeb8a2J6Zq4H1\nwN8Afw28PjN/HTgjItZWF1GSNF075f1d4KzG87OA7wHPy8zHG9tGgDUVZJMkzaBleWfmp4HlEbED\neAh4EzA25S37gGWVpJMkHVfL8o6Iq4FdmfkCYDXwd9Pe0ldFMEnSzFoesAQuBLYCZOZXI+KZ0z43\nDOxp9gWHDh2eHBjon3VIqSpX3DDCwUMTnDpwClvedXm340jTzTg4bqe8vw68BPjHiFgOjAPfjIgL\nM/MLwBXAzc2+YGxsfwdZpflzyXnDPPTEbladN8zo6Hi340hHqdUGZ3ytr9Wqgo1TBT8GLAX6gbdS\nP1Xwo9T/KjyamW9s9h2jo+MuXagFq1YbtLi1INVqgzOOvFuW91ywvLWQWd5aqJqVt1dYSlKBLG9J\nKpDlLUkFsrwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ5S1JBbK81dPuemAHV9wwwl0P\n7Oh2FKkjlrd62rbHd3Pw0AQPPrG721Gkjlje6mmrVwxz6sApXHLecLejSB1xSVj1PJeE1ULlkrCS\ndJKxvCWpQJa3JBXI8pakArW8e3xEbAI2ApPUbzj8QuAi4EPABPBkZr6uypCSpKN1dLZJRFwMvAL4\nZeCNmfl4RHwS+ERmbp3pc55tooXq7R//Erue+gHLl57Ojdec3+040lHm8myTtwHvAp6bmY83to0A\na2aZTeqqXU/94KhHqRRtl3dEvAj4FnAYGJvy0j5g2RznkiQ10cnI+zXAbY3nU4fyMw7rpYVucPEz\njnqUStHygOUUq4DrGs/PmrJ9GNjT7INDQ4sZGOjvLJk0D8b3H/zpY6022OU0UvvaKu+IWAaMZ+ah\nxs//GREXZOYjwBXAzc0+Pza2/4SDSlVYvvT0nx6w9BJ5LTTNBhTtjryXUZ/bPuJPgY9ERB/waGZu\nm308qXtuvOZ81zZRkVyYSj3P8tZC5cJUknSSsbwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeaun3fXA\nDq64YYS7HtjR7ShSRyxv9bRtj+/m4KEJHnxid7ejSB2xvNXTJiYmADh8eKLLSaTOWN7qaROTRz9K\npbC81dOeMXDKUY9SKfwXq5727LMWH/UolcLyVk/zNmgqleWtnrZ86elHPUqlcElY9TyXhNVC1WxJ\n2E5ugyYteOvWXc7OndVecHP22S9gy5aRSvchteLIWz3PkbcWKm/GIEknGctbkgrU7t3jrwbeBBwE\n3gZ8FbiDevnvBTZm5sGqQkpVunPrdi5dMdztGFJHWo68I+JM6oV9AfBS4HeAzcAtmbkS2AlsqjKk\nVKVP3ZvdjiB1rJ1pkzXAfZm5PzOfysxrgVXAkcPtI433SJLmSTvTJs8FlkTEPwFnAG8HFk+ZJtkH\nLKsmniTpeNop7z7gTOB3qRf5g41tU19vamhoMQMD/bPJJ82LWm2w2xGkjrRT3k8Bj2TmBPCNiBgH\nDkbEosw8AAwDe5p9wdjY/hNPKlXI87y1EDUbVLQz530vsDoi+iLiLOB04H5gfeP1dcA9JxpS6par\nLotuR5A61tYVlhHxh8BrgEngHcCXqZ8quAjYBVyTmYdn+rxXWGoh8wpLLVTNrrD08nj1PMtbC5WX\nx0vSScbylqQCWd6SVCDLWz3vzq3bux1B6pjlrZ7n2iYqkeUtSQWyvCWpQJa3JBXI8pakAlne6nmu\nbaISWd7qeRvWntPtCFLHLG9JKpDlLUkFsrwlqUCWtyQVyPJWz3NtE5XI8lbPc20TlcjylqQCWd6S\nVKCBVm+IiJXAZ4GvAX3Ak8B7qN+A+BRgL7AxMw9WmFOSNEW7I++HMnN1Zl6SmX8CbAZuycyVwE5g\nU2UJJUnHaLe8p9/BeBUw0ng+AqyZq0DSfHNtE5Wo5bRJw7kRcTdwJvVR9+Ip0yT7gGVVhJPmw4a1\n5zA6Ot7tGFJH2invHcBfZOZnI+IXgAenfW76qPwYQ0OLGRjon2VEqXq12mC3I0gd6ZucnOzoAxHx\nKPAi6qPvAxFxMXBdZv7eTJ8ZHR3vbCfSPKrVBh15a0Gq1QZnHBy3nPOOiA0R8YbG82cBS4GPA+sb\nb1kH3DMHOSVJbWpn2uSfgTsj4uXAM4Brga8An4iI1wK7gNuriyhJmq7jaZPZcNpEC9l9j+/m0hXD\n3Y4hHeOEpk2kk51rm6hElrckFcjylqQCWd6SVCDLW5IKZHmr57m2iUpkeavnbVh7TrcjSB2zvCWp\nQJa3JBXI8pakAlneklQgy1s9786t27sdQeqY5a2e59omKpHlLUkFsrwlqUCWtyQVyPKWpAJZ3up5\nrm2iElne6nmubaIStXMDYiLiNOBrwGZgG3AH9eLfC2zMzIOVJZQkHaPdkfdbge81nm8GbsnMlcBO\nYFMVwSRJM2tZ3hERwDnA54A+YCUw0nh5BFhTWTpJ0nG1M/J+L/Bn1IsbYMmUaZJ9wLIqgkmSZtZ0\nzjsiNgKPZOau+gD8GH3H2zjd0NBiBgb6ZxFPqt6dW7d70FLFaXXA8reB50XE5cAw8BPgBxGxKDMP\nNLbtabWTsbH9JxxUqsqn7k0uXTHc7RjSMWq1wRlfa1remfnKI88j4m3AfwMXAOuBTwLrgHvmIqQk\nqX2dnOd9ZIrkRuBVEfF5YAi4fc5TSZKaaus8b4DMfPuUHy+rIIskqU1eYSlJBeqbnJysfCejo+PV\n70Qnpde//2F++OND3Y5xwpacNsAt11/c7RgqTK02OOMZfW1Pm0jd8MMfH+Jjb15d6T5qtUFGR8cr\n3cemm7ZV+v3qPU6bSFKBLG9JKpDlLUkFsrwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ\n5S1JBbK8JalAlrckFcjylqQCWd6SVCDLW5IK1PJmDBHxTOA2YCmwCHgn8BXgDurlvxfYmJkHq4sp\nSZqqnZH35cBjmbkKuBJ4H7AZ+EBmrgR2ApsqSyhJOkbLkXdmfmbKj88B/gdYCVzb2DYCvAH4yJyn\nkyQdV9v3sIyILwDD1Efi902ZJtkHLKsgmyRpBm2Xd2ZeGBG/CnwSmHpH4xnvbnzE0NBiBgb6ZxFP\nqt8g2H1IR2vngOUKYF9mfjszn4yIfmA8IhZl5gHqo/E9zb5jbGz/3KRVT6r6zu7zcfd4qP730Mmn\n2R/8dg5YXkx9TpuIWAqcDtwPrG+8vg6458QiSpI60c60yYeBWyPiYeA04I+AfwXuiIjXAruA26uL\nKEmarm9ycrLynYyOjle/E52UvvDH11P7ydPdjnHCRk89gws/+P5ux1BharXBGY8ptn3AUuqGW5/z\nMj725tWV7mM+5rxvumkbF1a6B/UaL4+XpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ5S1JBbK8JalAlrck\nFcjylqQCWd6SVCDLW5IKZHlLUoEsb0kqkOUtSQWyvCWpQJa3JBXI8pakAlneklSgtm6DFhHvBi4C\n+oGbgMeAO6iX/15gY2YerCqkJOloLcs7IlYB52bmBRFxJvAE8ADwgczcEhF/CWwCPlJpUvWsTTdt\n63aEE7bkNG8Xq7nVzr+ozwOPNp4/DSwBVgLXNraNAG/A8lYFqr75MNT/OMzHfqS51LK8M3MS+FHj\nxz8APgesnTJNsg9YVk0
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f299dd8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Bug: if you include Seaborn, add 'sym='k.' to show the outliers\n",
"df.boxplot(column='AgeFilled', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Embarking: null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most passengers are in 'S'. There were also missing values."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Embarked'].isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we discussed previously, we will replace these missing values by the most popular one (mode): S."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Replace nulls with the most common value\n",
"df['Embarked'].fillna('S', inplace=True)\n",
"df['Embarked'].isnull().any()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Cabin: null values"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"We are going to analyse Cabin in the exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encoding categorical features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recap: encoding categorical features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous notebook we saw how to encode categorical features. We are going to explore an alternative way."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#df = df_original.copy()\n",
"#df['SexEncoded'] = df.Sex\n",
"#\n",
"#df.loc[df[\"SexEncoded\"] == 'male', \"SexEncoded\"] = 0\n",
"#df.loc[df[\"SexEncoded\"] == \"female\", \"SexEncoded\"] = 1\n",
"#\n",
"#df['EmbarkedEncoded'] = df.Embarked\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"S\", \"EmbarkedEncoded\"] = 0\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"C\", \"EmbarkedEncoded\"] = 1\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"Q\", \"EmbarkedEncoded\"] = 2\n",
"#df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encoding Categorical Variables as Binary ones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we see previously, translating categorical variables into integer can introduce an order. In our case, this is not a problem, since *Sex* is a binary variable, and we can consider there exists an order in *Pclass*.\n",
"\n",
"Nevertheless, we are going to introduce a general approach to encode categorical variables using some facilities provided by scikit-learn."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**LabelEncoder** transform categories into integers (0, 1, ...). We are going to use it for *Sex*."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" <th>SexCoded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked SexCoded \n",
"0 0 A/5 21171 7.2500 NaN S 1 \n",
"1 0 PC 17599 71.2833 C85 C 0 \n",
"2 0 STON/O2. 3101282 7.9250 NaN S 0 \n",
"3 0 113803 53.1000 C123 S 0 \n",
"4 0 373450 8.0500 NaN S 1 "
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
"\n",
"df = df_original.copy() # take original df\n",
"\n",
"# We define here the categorical columns have non integer values, so we need to convert them\n",
"# into integers first with LabelEncoder. This can be omitted if the are already integers.\n",
"\n",
"label_enc = LabelEncoder()\n",
"label_sex = label_enc.fit_transform(df['Sex'])\n",
"df['SexCoded'] = label_sex\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we see it has been easy and we have *Sex* as a binary variable.\n",
"\n",
"Now we are going to do the same with *Embarked* and *Pclass*. There are several alternatives in scikit-learn, such as *DictVectorizer* or *OneHotEncoder*.\n",
"\n",
"We are going to use *pd.get_dummies*, which provides a very easy-to-use way to encode categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>SexCoded</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" <th>Pclass_1</th>\n",
" <th>Pclass_2</th>\n",
" <th>Pclass_3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Name \\\n",
"0 1 0 Braund, Mr. Owen Harris \n",
"1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... \n",
"2 3 1 Heikkinen, Miss. Laina \n",
"3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) \n",
"4 5 0 Allen, Mr. William Henry \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin SexCoded \\\n",
"0 male 22.0 1 0 A/5 21171 7.2500 NaN 1 \n",
"1 female 38.0 1 0 PC 17599 71.2833 C85 0 \n",
"2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN 0 \n",
"3 female 35.0 1 0 113803 53.1000 C123 0 \n",
"4 male 35.0 0 0 373450 8.0500 NaN 1 \n",
"\n",
" Embarked_C Embarked_Q Embarked_S Pclass_1 Pclass_2 Pclass_3 \n",
"0 0.0 0.0 1.0 0.0 0.0 1.0 \n",
"1 1.0 0.0 0.0 1.0 0.0 0.0 \n",
"2 0.0 0.0 1.0 0.0 0.0 1.0 \n",
"3 0.0 0.0 1.0 1.0 0.0 0.0 \n",
"4 0.0 0.0 1.0 0.0 0.0 1.0 "
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Remove nulls\n",
"df['Embarked'].fillna('S', inplace=True)\n",
"df = pd.get_dummies(df, columns=['Embarked', 'Pclass'])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cleaning: dropping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We should drop columns we will not use. In the exercise, you will need to use 'Cabin'."
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>SexCoded</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" <th>Pclass_1</th>\n",
" <th>Pclass_2</th>\n",
" <th>Pclass_3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Name \\\n",
"0 1 0 Braund, Mr. Owen Harris \n",
"1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... \n",
"2 3 1 Heikkinen, Miss. Laina \n",
"3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) \n",
"4 5 0 Allen, Mr. William Henry \n",
"\n",
" Sex Age SibSp Parch Fare SexCoded Embarked_C Embarked_Q \\\n",
"0 male 22.0 1 0 7.2500 1 0.0 0.0 \n",
"1 female 38.0 1 0 71.2833 0 1.0 0.0 \n",
"2 female 26.0 0 0 7.9250 0 0.0 0.0 \n",
"3 female 35.0 1 0 53.1000 0 0.0 0.0 \n",
"4 male 35.0 0 0 8.0500 1 0.0 0.0 \n",
"\n",
" Embarked_S Pclass_1 Pclass_2 Pclass_3 \n",
"0 1.0 0.0 0.0 1.0 \n",
"1 0.0 1.0 0.0 0.0 \n",
"2 1.0 0.0 0.0 1.0 \n",
"3 1.0 1.0 0.0 0.0 \n",
"4 1.0 0.0 0.0 1.0 "
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop(['Cabin', 'Ticket'], axis=1, inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Engineering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Feature Engineering is the process of using domain/expert knowledge of the data to create features that make machine learning algorithms work better. We are going to define several [new ones](https://triangleinequality.wordpress.com/2013/09/08/basic-feature-engineering-with-the-titanic-data/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Basic Feature Engineering with the Titanic Data](https://triangleinequality.wordpress.com/2013/09/08/basic-feature-engineering-with-the-titanic-data/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1+"
}
},
"nbformat": 4,
"nbformat_minor": 0
}