You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml2/3_4_Visualisation_Pandas.ipynb

4789 lines
783 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Introduction: preprocessing](#Introduction:-preprocessing)\n",
"* [Visualisation with Pandas](#Visualisation-with-Pandas)\n",
"* [Loading and Cleaning](#Loading-and-Cleaning)\n",
"* [General exploration](#General-exploration)\n",
"* [Feature Age](#Feature-Age)\n",
"* [Feature Sex](#Feature-Sex)\n",
"* [Feature Pclass](#Feature-Pclass)\n",
"* [Feature Fare](#Feature-Fare)\n",
"* [Feature Embarked](#Feature-Embarked)\n",
"* [Features SibSp](#Features-SibSp)\n",
"* [Feature ParCh](#Feature-ParCh)\n",
"* [Recap: Filling null values](#Recap:-Filling-null-values)\n",
"\t* [Feature Age: null values](#Feature-Age:-null-values)\n",
"\t* [Feature Embarking: null values](#Feature-Embarking:-null-values)\n",
"\t* [Feature Cabin: null values](#Feature-Cabin:-null-values)\n",
"* [Encoding categorical features](#Encoding-categorical-features)\n",
"\t* [Recap: encoding categorical features](#Recap:-encoding-categorical-features)\n",
"\t* [Encoding Categorical Variables as Binary ones](#Encoding-Categorical-Variables-as-Binary-ones)\n",
"* [Cleaning: dropping](#Cleaning:-dropping)\n",
"* [Feature Engineering](#Feature-Engineering)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction: preprocessing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous session, we introduced two libraries for visualisation: *matplotlib* and *seaborn*. We are going to review new functionalities in this notebook, as well as the integration of *pandas* with *matplotlib*.\n",
"\n",
"Visualisation is usually combined with munging. We have done this in separated notebooks for learning purposes. We we are going to examine again the dataset, combinging both techniques, and applying the knowledge we got in the previous notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualisation with Pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas provides a very good integration with matplotlib. DataFrames have the following methods:\n",
"* **plot()**, for a number of charts, that can be selected with the argument *kind*:\n",
" * 'bar' for bar plots\n",
" * 'hist' for histograms\n",
" * 'box' for boxplots\n",
" * 'kde' for density plots\n",
" * 'area' for area plots\n",
" * 'scatter' for scatter plots\n",
" * 'hexbin' for hexagonal bin plots\n",
" * 'pie' for pie charts\n",
" \n",
"Every plot kind has an equivalent on Dataframe.plot accessor. This means, you can use **df.plot(kind='line')** or **df.plot.line**. Check the [plot documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html#pandas.DataFrame.plot) to learn the rest of parameters.\n",
"\n",
"In addition, the module *pandas.tools.plotting* provides: **scatter_matrix**.\n",
"\n",
"You can consult more details in the [documentation](http://pandas.pydata.org/pandas-docs/stable/visualization.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading and Cleaning"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# General import and load data\n",
"import pandas as pd\n",
"\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"sns.set(color_codes=True)\n",
"\n",
"# if matplotlib is not set inline, you will not see plots\n",
"\n",
"#alternatives auto gtk gtk2 inline osx qt qt5 wx tk\n",
"#%matplotlib auto\n",
"#%matplotlib qt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
8 years ago
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#We get a URL with raw content (not HTML one)\n",
"url=\"https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv\"\n",
"df = pd.read_csv(url)\n",
"df_original = df.copy() # Copy to have a version of df without modifications\n",
"df.head()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>0</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>1</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>1</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>0</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp Parch \\\n",
"0 Braund, Mr. Owen Harris 0 22.0 1 0 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 38.0 1 0 \n",
"2 Heikkinen, Miss. Laina 1 26.0 0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 35.0 1 0 \n",
"4 Allen, Mr. William Henry 0 35.0 0 0 \n",
"\n",
" Fare Embarked \n",
"0 7.2500 0 \n",
"1 71.2833 1 \n",
"2 7.9250 0 \n",
"3 53.1000 0 \n",
"4 8.0500 0 "
]
},
8 years ago
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Cleaning\n",
"df_clean = df.copy() # We copy to see what happens with na values\n",
"df_clean['Age'] = df['Age'].fillna(df['Age'].median())\n",
"df_clean.loc[df[\"Sex\"] == \"male\", \"Sex\"] = 0\n",
"df_clean.loc[df[\"Sex\"] == \"female\", \"Sex\"] = 1\n",
"df_clean.drop(['Cabin', 'Ticket'], axis=1, inplace=True)\n",
"df_clean.loc[df[\"Embarked\"] == \"S\", \"Embarked\"] = 0\n",
"df_clean.loc[df[\"Embarked\"] == \"C\", \"Embarked\"] = 1\n",
"df_clean.loc[df[\"Embarked\"] == \"Q\", \"Embarked\"] = 2\n",
"df_clean.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# General exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous session we saw that *Seaborn* provides several facilities for working with DataFrames. We are going to review some of them."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>714.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>446.000000</td>\n",
" <td>0.383838</td>\n",
" <td>2.308642</td>\n",
" <td>29.699118</td>\n",
" <td>0.523008</td>\n",
" <td>0.381594</td>\n",
" <td>32.204208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>257.353842</td>\n",
" <td>0.486592</td>\n",
" <td>0.836071</td>\n",
" <td>14.526497</td>\n",
" <td>1.102743</td>\n",
" <td>0.806057</td>\n",
" <td>49.693429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.420000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>223.500000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>20.125000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.910400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>446.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>14.454200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>668.500000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>38.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>891.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>80.000000</td>\n",
" <td>8.000000</td>\n",
" <td>6.000000</td>\n",
" <td>512.329200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp \\\n",
"count 891.000000 891.000000 891.000000 714.000000 891.000000 \n",
"mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n",
"std 257.353842 0.486592 0.836071 14.526497 1.102743 \n",
"min 1.000000 0.000000 1.000000 0.420000 0.000000 \n",
"25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n",
"50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n",
"75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n",
"max 891.000000 1.000000 3.000000 80.000000 8.000000 \n",
"\n",
" Parch Fare \n",
"count 891.000000 891.000000 \n",
"mean 0.381594 32.204208 \n",
"std 0.806057 49.693429 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 7.910400 \n",
"50% 0.000000 14.454200 \n",
"75% 0.000000 31.000000 \n",
"max 6.000000 512.329200 "
]
},
8 years ago
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# General description of the dataset\n",
"df.describe()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId int64\n",
"Survived int64\n",
"Pclass int64\n",
"Name object\n",
"Sex object\n",
"Age float64\n",
"SibSp int64\n",
"Parch int64\n",
"Ticket object\n",
"Fare float64\n",
"Cabin object\n",
"Embarked object\n",
"dtype: object"
]
},
8 years ago
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Column types\n",
"df.dtypes"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Name object\n",
"Sex object\n",
"Ticket object\n",
"Cabin object\n",
"Embarked object\n",
"dtype: object"
]
},
8 years ago
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Columns non numeric\n",
"df.dtypes[df.dtypes == object]"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Survived 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 177\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 0\n",
"Cabin 687\n",
"Embarked 2\n",
"dtype: int64"
]
},
8 years ago
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Number of null values\n",
"df.isnull().sum()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fd12d9c0080>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b505198>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b4c2828>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b48ae80>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b44d2e8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b3992e8>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b360d30>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b31fcc0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12b2eb438>]], dtype=object)"
]
},
8 years ago
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXecHlXV+L+7STYhm4QECCAEQj+KCtKlhYABqb7YC6ih\nCgJSRSJiR3mlFwUkVBUs/ETwRektNKVEX1A8CS0x4Aths7Bk03f398e5TzJ59inzzM7TZs/388kn\n+8zcuXNn7syZe889paWvrw/HcRwnW7TWuwGO4zhO+rhwdxzHySAu3B3HcTKIC3fHcZwM4sLdcRwn\ng7hwdxzHySBD692ARkVEhgJzgb+p6oH1bo8zcESkF3gO6A2b+oCnVfXY+rXKGSgiMhF4CfjfsKkl\n/H+Zql6fQv0PAper6u8HWlctceFenI8Dfwd2EBFRVa13g5wB0wdMVtXOejfESZ1Fqrp97oeIbAA8\nLyJPqerzdWxX3XDhXpyvArcAs4FTgeMAROQs4EigC5gBHKqqm4rIMOC/gUnAEGAm8DVVXViHtjuF\naWHVqG41RORI4FhgGLAWcJ6qXi0iXwaOAtqBt1X1IyJyFHB8qKsDOMk//o2Fqr4uIrOB7UTk68CW\nWL++C3xBVWeHEfkCQIArgVuBq4D3Aj3A1ap6eajyUBH5BrAucL+qHl3bK6oc17kXQES2BnYBfgPc\nBBwuIuNE5KPAl4AdVHVHYDQ2GgQ4C1iuqjuq6nbAfzBh7zQWD4rIsyIyM/y/joi0YwL8AFXdAfgc\ncH7kmK2BSUGwT8KegT1C2fOBppquDwZEZFdgc0wF16mqu6nqe4GngRMjRReo6gdU9afAzwBV1fcB\nuwHHiMhmodwoVd0FexYOEJHda3YxCfGRe2GOA+5U1XeAp0Xk1bBtPeB3qvpuKPdTYJ/w98HAmiKy\nX/g9DHijZi124lJQLSMihwAHi8iWwIewkXqO/1XV7vD3QZjQeFxEcrOAsSIyVlXfrmbDnZKMFJFn\nsdnUUGA+NkK/R0T+KSInAlsAk4HHI8fNiPw9Bfg6gKp2AdsAiAjYQA9VXRxmBOtW9WpSwIV7HiIy\nEhuZLRaRl7GHZTRwAqamic52eiN/DwFOVtW7I/WMqEmjnUrop5YRkQ2BJ4CrsZf9VkyI54iq1oYA\nv1DVadHjXbDXndV07jlE5HjgGOBy4FeYGmaTSJFo3y5n1UwcEdkUeCuyL0cfRdR7jYSrZfpzODBf\nVd+jqpup6qbYSK0d06N/QkTGhLJHsephuBs4UUSGiUgrcC3w4xq33UnGjsCbqnquqt4LHAIQGZlH\nuQf4vIisH8p8FbivZi11ilFM2O4HXB+sZmZjfTukSNn7gCMARGRN4H5stN+UuHDvz3HAhdENQT1z\nGXAycA3whIj8FRvRLwrFfgC8in0AnseE/um1abITk2IhUO8B5omIisgzwARsWt/vxVbVe7C1lHtF\n5G+Yfv7jVWqvE59ifXsBcFxQ2dwLPMOqfs0/5iRgaxH5OzaDO1dVZxYo1xShdFs85G98RGR7YPfc\nCrqInArsrKqfr2/LHMdxVieWzl1EdsFMw/YWkQ9hZkPLgVk5kyAROQYzJVuOffHurFKb68ls4CwR\nORb7es/BrrlpCc5aN2J6yBWYfrIHuAFbU3heVU8IZQdDH2eSYNI5FXtu1wC2BfYELsH7OZOUHbkH\nG9EvAgtVdTcR+T1m/3m3iPwSW2R8GpvybA+MBB7FzAWXF6vXaQxE5GOYVcHnRGQKppYaBlygqjNE\n5ErgLuBJvI8zgYhcAfwN0z97P2eUODr3F1ldpzgTWCcsNo3Gvu47A4+q6opgQjSbYEbkNDyzgKGh\nP9fE+nN7Vc2ZiP0Z2Bfv40wgIjsCW6vqdExoez9nlLLCXVVvw6brOWZji4v/wGw9HwLGAO9EyizE\nBIXT+CwENgX+hZkCXsbqlgfvYv07Gu/jLDAN+G6B7d7PGSOJnful2KLiv4IZ2EXYdG5MpMxooKzd\nb19fX19Ly8DNRXt6enj11VfLlttkk00YMqSYFdSgIf+GnwrcpapnB3vvh4C2yP5cX3ZRxz52KqJY\niIU1ga1U9ZGwKeqnkbifvY/rQtkbnkS4d2BfeYDXMTfdp4BzRaQNW6x5L2YOWLp1LS3Mn/9uuWIr\nGT9+dMHyc+a8wukX3Ulb+9pFj13W3cGFpx3ExImbVlR3pW1phrrzWMAqB423sWdipojspaoPAwcA\nD1CjPs5va9JjB3p8s5+7CJMw2+0cM0VkUhD2ifv5exfeyNKlK0oVYe8Pv5/ttvlAwbY2831uwD5e\nSRLhfgzwGxFZDiwDjlHVN0TkMmzxpQX4pqouS1B3Ytra12bEmPWK7u/r6+W11+YV3d/V1U5np3mY\nT5iw8WAa4V8CXCcij2ALqWdhtsDTQzC0F4BbVbWv3n3sDBgBXo78PgO4ZqD9/NS80bQOKS1KJr4y\nt6Bwd6pHXOG+PrA0/D0Lc8kdi62m56RgK/YwLGd1V92GYFl3Jxf9ppO29uIC3sqVHuFnDVXtFpE/\nYWZyK4BTWN1MbkvgCuAEVb02eN8eC5wpIsvdTK55UNUL8n7PxmKt5Je7FvOwdpqYssI9agoZNv0E\n+KWq3ioik4H3isgizLtrpfmUiNzTaOZT5Ub3gxVVvRGzdc+ZyV0LfBsbtc0QkStF5L8wM7mK+vn3\nt/+Zd99dUnBfS0sL+31k8mCaJTlOzYgzcs+ZQv4i/N4d+LuI3Au8grnkTyGYTwFdIWraNtj03mkS\nImZyJ4rId/PM5PbDFuAq6ucr75zH8CJrIT1dr7L7h3dkzBg3xqg2IQ/BxzDV28+AR3BHtUxTVrir\n6m0hjVWOTbAYyPuKyDmYjnYWCc2n4iwMlCvf1dVeoGRyxo1rj9WuStqexnWmVXcJUjeTa2kZQktr\n4ZF5S6uHNqoFIrIXsGtwQmzHdO0XkcLMzGlcklrL/DH8/UfgXGyFvWIzOSAVK5LcQmhadHZ2l21X\nhqxlgOqZyZWipaWFtdcexdixxT9OA/1wDeT4Zj53Hh/FUs79AeuzM4Gj05iZOY1LEuE+AzgQi408\nCTOTSmQm5zQUVTGTK0VfXx8dHQtZvrzwyL7ZTdUayExuHWBjLKHMZsAdrO7AWHUHplGjRxT9YDXz\nR7SBPuD9iCvcPwS8P/x9BmYm923MQ3UTVX1HRGZiNtN9wCVuJtd0VMVMzmkIOoAXwoh8logswcIa\n56jKzCzKwneXFPxgNftHtIE+4P2oxFrmHwCqOjckij0f6AiCfT1MT5czj3xURL7vurrmwc3kMs2j\nwNeAi0VkAyzxzP1pOao5jUnF1jIisjbwQ1YlroBIsCFcV9d0uCVFtlHVO0Vkz5BgpgU4Hkss445q\nGaYia5ngwDIdOI1VTk3ggcOaFrekGByo6lkFNk8uUM5nZhmh0gXV7bEUVVdi07b3ichFwIMk1NW5\nKWTt687DLSkGASF9YG4A9grwI3x2lmkqEe4tqvo08EGAMJq/RVVPCzr3HybR1bkpZO3rzqPulhRO\ndRGR4QCquk9k2+347CzTVCLci6ZsaoTAYU5i6mZJ4Xbu1Tt3HtsC7SJyNxYL6mz6J2Tx2VnGqDhw\nWMihehkWZOodERmvqvNp8MBhTlHqZknhdu7VO3cei4DzQ+C3LTFh7glZMk6SwGGXYBECnwuJor8h\nIufj07mmxC0pBgWzMKs3VHW2iHRg72qOqtu5uxNT+ucuR5LAYZ9V1Tcixy/BTSGbGrekyDxHYmtl\nJ4TZ2RjgnlraubsTU/rnLkfFgcNygl1EdgNOwNzW98enc02NW1NkmmuB60VkBtafU7G1Fp+dZZgk\nsWUQkc9iEQQPVNUOEUk8nWs0U8i+vl4WLlxQts611hqZFVNIt6bIOKF/Di+wa3KBsj47ywgVC3cR\nORwbuU1W1ZwA/ysZMYVc1t3Jd37+BG3ts0qU6eC6H3yGMWPWjVVng5tCgltTOE7mqEi4Bw/VS4E5\nwG0i0gc8rKrfy9J0bhB
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd12d9acc18>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Analise distributon\n",
"df.hist()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>PassengerId</th>\n",
" <td>1.000000</td>\n",
" <td>-0.005007</td>\n",
" <td>-0.035144</td>\n",
" <td>0.036847</td>\n",
" <td>-0.057527</td>\n",
" <td>-0.001652</td>\n",
" <td>0.012658</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Survived</th>\n",
" <td>-0.005007</td>\n",
" <td>1.000000</td>\n",
" <td>-0.338481</td>\n",
" <td>-0.077221</td>\n",
" <td>-0.035322</td>\n",
" <td>0.081629</td>\n",
" <td>0.257307</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pclass</th>\n",
" <td>-0.035144</td>\n",
" <td>-0.338481</td>\n",
" <td>1.000000</td>\n",
" <td>-0.369226</td>\n",
" <td>0.083081</td>\n",
" <td>0.018443</td>\n",
" <td>-0.549500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Age</th>\n",
" <td>0.036847</td>\n",
" <td>-0.077221</td>\n",
" <td>-0.369226</td>\n",
" <td>1.000000</td>\n",
" <td>-0.308247</td>\n",
" <td>-0.189119</td>\n",
" <td>0.096067</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SibSp</th>\n",
" <td>-0.057527</td>\n",
" <td>-0.035322</td>\n",
" <td>0.083081</td>\n",
" <td>-0.308247</td>\n",
" <td>1.000000</td>\n",
" <td>0.414838</td>\n",
" <td>0.159651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Parch</th>\n",
" <td>-0.001652</td>\n",
" <td>0.081629</td>\n",
" <td>0.018443</td>\n",
" <td>-0.189119</td>\n",
" <td>0.414838</td>\n",
" <td>1.000000</td>\n",
" <td>0.216225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fare</th>\n",
" <td>0.012658</td>\n",
" <td>0.257307</td>\n",
" <td>-0.549500</td>\n",
" <td>0.096067</td>\n",
" <td>0.159651</td>\n",
" <td>0.216225</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp Parch \\\n",
"PassengerId 1.000000 -0.005007 -0.035144 0.036847 -0.057527 -0.001652 \n",
"Survived -0.005007 1.000000 -0.338481 -0.077221 -0.035322 0.081629 \n",
"Pclass -0.035144 -0.338481 1.000000 -0.369226 0.083081 0.018443 \n",
"Age 0.036847 -0.077221 -0.369226 1.000000 -0.308247 -0.189119 \n",
"SibSp -0.057527 -0.035322 0.083081 -0.308247 1.000000 0.414838 \n",
"Parch -0.001652 0.081629 0.018443 -0.189119 0.414838 1.000000 \n",
"Fare 0.012658 0.257307 -0.549500 0.096067 0.159651 0.216225 \n",
"\n",
" Fare \n",
"PassengerId 0.012658 \n",
"Survived 0.257307 \n",
"Pclass -0.549500 \n",
"Age 0.096067 \n",
"SibSp 0.159651 \n",
"Parch 0.216225 \n",
"Fare 1.000000 "
]
},
8 years ago
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can see the pairwise correlation between variables. A value near 0 means low correlation\n",
"# while a value near -1 or 1 indicates strong correlation.\n",
"df.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do not find any relevant correlation. We could also represent this with a scatterplot."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<seaborn.axisgrid.PairGrid at 0x7fd12b267ef0>"
]
},
8 years ago
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAABocAAAZNCAYAAAAH8YihAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XmcW2d58P3fOdKRRiPN7vGSxIkdxyMvITbJk+RtAgR4\neAhbaVw+hRQIT8v6pEBfulB46PK87adAKRRouqS00AAGSkp5kk8/QEgoaRNCWExix5nY1tiOx1vs\nGc+i0TpHRzrn/UOj5ZzRaGY00kg6c33/SW5ZOrqlOTrLfd33dSmWZSGEEEIIIYQQQgghhBBCCCHW\nBrXZHRBCCCGEEEIIIYQQQgghhBCrR4JDQgghhBBCCCGEEEIIIYQQa4gEh4QQQgghhBBCCCGEEEII\nIdYQCQ4JIYQQQgghhBBCCCGEEEKsIRIcEkIIIYQQQgghhBBCCCGEWEMkOCSEEEIIIYQQQgghhBBC\nCLGGeFfjTcLh8M3AX0QikVeEw+FtwJcBExiORCLvn3vOe4D3Agbw8Ugk8t1wONwBfA1YD8SA/xmJ\nRCZXo89CCCGEEEIIIYQQQgghhBBu1PCVQ+Fw+MPAPwH+uYc+C3wsEoncBqjhcPhXwuHwBuCDwC8B\nrwE+GQ6HNeBu4HAkEnkZsB/440b3VwghhBBCCCGEEEIIIYQQws1WI63cCWBfWfuGSCTyo7n/fwj4\nH8BNwBORSCQbiURiwHFgD/AS4Ptlz33VKvRXCCGEEEIIIYQQQgghhBDCtRoeHIpEIg8A2bKHlLL/\njwPdQBcwU/Z4AuhxPF54rhBCCCGEEEIIIYQQQgghhKjRqtQccjDL/r8LiJKvJ9TteHx67vEux3Or\nsizLUhRlsacJUYuG71iy/4oGkX1XtDPZf0U7k/1XtCvZd0U7k/1XtDPZf0U7a+iOJfuuaKA1u2M1\nIzj0dDgcflkkEnkceC3wKHAA+Hg4HPYBAWAHMAw8CbwO+MXcf39UeZMliqJw6VK8UX0HYHCwS96j\nhd5jtd5ncLBr8SetUCP230Z9N43YrvS1cX1tNLcce1frfeQ9lvcejdao/VeOPdLXdt5/y7npeCLv\nsfT3aDS37Lur9T7yHst7j0ar5/5bz++kFbfVin1q9W01mow7SF/b9dp3Na4dwF3nQ3mPpb/HWrUa\nNYecfh/4s3A4/GNAA/4tEomMAfcATwD/AXwsEolkgHuBa8Ph8I+AdwN/2oT+CiGEEEIIIYQQQggh\nhBBCuMaqrByKRCKngVvm/v848PIKz/kS8CXHY2ngzavQRSGEEEIIIYQQQgghhBBCiDWhGSuHhBBC\nCCGEEEIIIYQQQgghRJNIcEgIIYQQQgghhBBCCCGEEGINkeCQEEIIIYQQQgghhBBCCCHEGiLBISGE\nEEIIIYQQQgghhBBCiDVEgkNCCCGEEEIIIYQQQgghhBBriASHhBBCCCGEEEIIIYQQQggh1hAJDgkh\nhBBCCCGEEEIIIYQQQqwhEhwSQgghhBBCCCGEEEIIIYRYQ7zN7oAQojEuTib59DcPkZo16PRrfPht\ne9nYF2x2t8QCxpKXuOfQP5LKpun0Bvjtve9jQ3Bds7vVEuS7aT1yfKlubGaaz//kG6SJE6CLD93y\nVjZ09zW7W6IC+Vs1VyKT5P6RB5hIT3FZ73r2XfXLhHzBiv/eo/VijO5mOmoy2BvgrtuHwIL9j4xw\nKZouPhYK+Jr4iUSrODVxns8f/Ceyio7X8vM717+XLQOXVX1NIpWR/Umsqlr209VQ+C1Ekxl6gz75\nLYglmUlmuPfBYS5F0/SF/FhYjKcmiW96HEUzCGqdch8nWp5b7g0K19DR7Aw93h7uHNpnu8ZuF3I+\nWh0SHBLCpf78q78gpecA0A2dP//yL/jb37mtyb0SC/nrg//ATCYOQCaX4a8P3ssnXvLHTe5Va7jn\n0D8S1WeA/Hdzz6Ev8PFb/7DJvarN82OX+NyTXyfnTeLJBvndW9/O1vXtd4P0ia89RSKdBfLHl098\n9Snu+X9f1uRetY5P/2g/6c5zABhM8enH9/OZN/x2k3slKvnMj79KquM8kP9bfebH+/n0a+VvtVru\nH3mAp8cPA3Amfo4jJycJXLyJwd4At9+ygb9+5u/IelJzzz5H1pjAuLiX0YtxsjkTr0flwLFxAEYv\nxjlxfoY/feeNtpvG8gH/KzZ08eaXXy03lWvAXz39BSzvLABZUnzmqX/gb1/9Z1Vfs/+REdv+BHD3\nHdc2tqNiTfvs0/+I6U0D+f30r576An/z6j+taVv1HAgs/y0UuOm3IIONjfGXXz3A4RMTAIwSB08G\n/97/RPVYYEJUn+Gvn/4HPvHSP2pyT4VY2Kd+9BX0zheA/L3Bpx7/Cp99w4ea3KvlK7/GBlCAd137\n9uZ1qEb3PXSMg8cniu1szuSDb7quiT1yJwkOCeFShcDQQm3RWmJzgaGF2mtZIpOs2m4nn/3Jfqye\niyiAyQx/9eRX+ds7frfZ3Vq2RCaFtu0Iij+FpQdIjO5udpdaSsKM4XG0RWtKevO/x2Lbc6FpfXG7\n8lVAA4F+7hzax3hy0vac6cw0YxfjjF6Mc8T6D8yelO3f1e4JfLuexNIDHDt3HRt6eu2vj+vc971j\ntptG54C/rmcrDnIuZ9VIpc/SjrMx3cxUZ22/bVOdXfQ1F2eiaNsOFc9tF2dublwHhQByim7bT3OK\nXvO2vn7sWxyeOFLalpnlfdf9Rk3buhRNV223O7cHv5ohkcpw+NQLpfuDTAdqMJoPDJWJ6Qvfx8m5\nVbSCtDZuq7+S1sYXfG4rOx+/aGu/4Gi3i8iZaNW2qA8JDgkhRAuwFmmvZdmcaauQl82ZzevMCpmd\nl+yDVZ2XmtaXldC2DuPtn7tQDsVAsYDXNLVPrcTKdAAxR1u0JE+uelvMU+vgjXOVkALEpr1QFn+x\n9M7i/xuehC3ICqBqWdBiEIqR41lmxuYP3jtvGp2Dmhdnonxp+Gvz+n/fI4cZNh5H6U9xXg+QfVjn\ng3fcsOTP0o6zMYVdasPP8frmBk9CMVJdPwde0tQ+CXezFNN2XWgptV/jjkyfrNpejsHeQHH1XKHt\nJm4PfjXD/kdG0LYcwTtQGIBeYGJUzsffPPgLTnmeRPWnuWb9Jt6+602EfEE5t4rWoFjV221iPG0P\nao2l2zPIJSNlq0OCQ0IIIVqbaQ8OYbZvcAg1V73dJtSuqartNU/JVW+LlqEs0hbz1Tp4M5Gemtf2\njV1PtlOfW6nRiTG6K/+PngyKtsgMen+K6USl59hvGp2DnBOhA0yMX5jX/1OeJ/F2lwIDkey/M5a8\niu+c+v68QFKlzyLaX8p3sWq7EklPtXTyXc2nKNXbyzGb06u2l2Pfy7Zy4vxMvrZkh8a+27bWtJ1W\nrXnh9uBXM1yKplH6U1WfY+YUfKO3MNz/ePF8++zUJB9/NE7g4k3ELjtvG6GUc6toCpfcr7slpDK0\nuZdDJyZtbVF/EhwSQgjR2tRF2mL1yWqLqtTQTNW2EO3MmQqu0F4sLdtAoJ8z8XPF9oXERdRNT8CM\nRiby3yBXeq5vyxFUf2lg0zQVVNV+W7vQijznTeNdtw8B8NypKVJ6lpyWtK1IOnjhKL//wqcwOuzp\nXC1vmnsOfaFY8648kOT8LOsC/RX7UovlpLcTVdQS+bUcz1vCSMpqpKdyS1BFUnm1j2/950mm4/lj\nsG7ofOvRkzXVeGjVmheF80L5b0qszGBvgPOOlfNOKhbW5qdQPQnb44WUslrQi3eg9HgsEyeRSTJI\n17L6IunpxErUM2jfVCaOCbbN6sjKvPP1O9n/8IgcrxtMgkNCCCFam4um9luGhuI3bO325Ja5SGKt\nq2EseM1zpoKLTedvJ5y1fcA+8Hvn0D6enx5lxsgPHBlWFrxTcwNBCsbJvcXnejocKX5yHlCztofU\nrgm0bQcxRndDzkfA78HvVTl2eooPfO5xwpt7+c3X7yAU8HH3HdfyZ18+wOjFOJYeyKfDnGN5DNLY\nA14FScM+C7owi/nOoX0
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd12b26ba58>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# General description of relationship betweek variables uwing Seaborn PairGrid\n",
"# We use df_clean, since the null values of df would gives us an error, you can check it.\n",
"g = sns.PairGrid(df_clean, hue=\"Survived\")\n",
"g.map_diag(plt.hist)\n",
"g.map_offdiag(plt.scatter)\n",
"g.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two many variables, we are going to represent only a subset."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<seaborn.axisgrid.PairGrid at 0x7fd128874ac8>"
]
},
8 years ago
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkoAAAIVCAYAAAAu+9C7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XucZFV56P3fru6enp7unisNGAkXwVmKRDGQYAiCevTg\njQOTc07QCDleiIrkYkzwlSQmMe8xGjnyJiZqVAQToh6MZmIUNRyDCmKOeAGBAGu4KypMMzM93dM9\nfa39/lFVM9U1vftetXv3/L6fDx967bXrWc++dPUze6/alaRpiiRJkg5VyjsBSZKklcpCSZIkKYOF\nkiRJUgYLJUmSpAwWSpIkSRkslCRJkjK05zFoCKEEfAwIQBl4c4zxnrr+twKXADuri94UY7y/5YlK\nkqTDWi6FEnAekMYYzwohnAP8OXBBXf9pwMUxxttzyU6SJImcbr3FGD8PvLHaPB7Y07DKacAVIYRb\nQgjvaGVukiRJNbnNUYoxlkMInwD+CvhkQ/engTcDLwTOCiG8vMXpSZIkkeT9FSYhhCOB24Bnxhj3\nV5etjzEOVn++FNgcY3x3Vow0TdMkSVqSr1atpp9AnqdaBp6nKoJVdQLlNZn7IuCYGON7gVFgisqk\nbkII64G7QwjPAPYDLwI+Plu8JEno7x9qWr59fb3Gz3mMVsRvtmacp8u9X5qxn4sQswg51mI2m++n\nxl+OMVaTvG69/RPw3BDCN4AvA28FfiWEcEn1StIVwNeBbwB3xxi/klOekiTpMJbLFaUY4whw4Sz9\nn+TQeUuSJEkt5QMnJUmSMlgoSZIkZbBQkiRJymChJEmSlMFCSZIkKYOFkiRJUgYLJUmSpAwWSpIk\nSRkslCRJkjJYKEmSJGWwUJIkScpgoSRJkpTBQkmSJCmDhZIkSVIGCyVJkqQMFkqSJEkZLJQkSZIy\ntOcxaAihBHwMCEAZeHOM8Z66/vOAdwITwLUxxqvzyFOSJB3e8rqidB6QxhjPolIQ/XmtI4TQDlwF\nvBh4AfDGEEJfHklKkqTDWy5XlGKMnw8hfKHaPB7YU9f9TOD+GOMgQAjhm8DZwOdamqS0gr1++9vp\n7IUkgTSFsSG4Ztv78k5LeGxmc8O9X+dLX/0SaVJpp2VI6v+5nsLU4CYmfrSVzq13kLRPkE52MLbj\nVNY89SHWbt7LeHmclMr+hco+pq5di5M2LuPg+rMtT1Mq9zJKbUBCed9GoEypZxCA8tBmJh4+BabW\nQNs4HSfcTal3d7VvIxMPP7vSV9UGTFV/7mhLePtFz+WoDeu47sYd9A/sp29jF9vOPoHtNz9M/8B+\njtqyjvHxKZ7cN8j4UT+ge8MEw3s7WPPEczh6w0YuPncrPV2V+PtGxjPj9G3smrauFi+XQgkgxlgO\nIXwCuAD4b3Vd64G9de0hYEMLU5NWvM5eKFX/wCRJpa2VwWOT7Us//hKUoFanJG0NKyTQvnEPpd7b\nKLWl1XXG6Dy50p6gUljV1zkzFT0k09eZ1pXRUVueJEAn1Mqb0qYnp61X2rwT0nuYePBUOo6/h/bN\nO+v6njzQVzNV99qJqZT3/cPtnPr0Pr5zX+V1jzw+xAM/3sueobEDbYCOE++gfc3jDO4H1sDkujF+\ndF8l7qUXnALAdTfumDNObV0tXm6FEkCM8bUhhCOB20IIz4wx7gcGqRRLNb3AwFyx+vqa+25k/PzH\naMU2NNtybUPjm32SLF/sZuznIsQswrFplWblm85SwNRLSums7bwlnSPT/j9TX5bJqZSB4fFpy0ZG\nJ+aMU2sPDI8fOD5zxalft17Rzse85TWZ+yLgmBjje4FRKkV3udp9L3BSCGEjMELlttuVc8Xs7x9q\nUraVk8r4zRljamqKxx774azrHHPMsRx99Mam76NWWK5taLx9kKbLE7sZ50oRYi5nvGYdGyjeedoo\nSZlXpZSWE5K2NLOdt3RsXfX/XVC9JdfYl6W9LWFj9/TbYevWdjA2MdYQZ3rsWtyN3WsOHJ9D4nRO\nj1O/bk2r/h6sJnldUfon4NoQwjeqObwV+JUQQneM8eoQwtuAG6n8Sl0dY/xpTnmqyR577IdcccO7\nWLt55jeX0d0jvOcVf8LRR29scWYr29gQh8yD0crgscl23jHn84XHPl/8OUqPnAzAxCPPgiStm6O0\n6UBfzYxzlDZW3u8OzC065wS2f6NxjtIZjPfWzVEaeQ5HP6MyR6mm9vNMcWpzlLR0SZqunCp9CdIi\nX5EpevyljPHoow/zrn+/kq4je2bs379zH3/yS5dz+unPbvY+ms8dgaVa9vN0JV9ZKVLMIuRYjVnI\n87Re0d/vjD+vMVpxnraMD5yUJEnKYKEkSZKUwUJJkiQpg4WSJElSBgslSZKkDBZKkiRJGSyUJEmS\nMlgoSZIkZbBQkiRJymChJEmSlMFCSZIkKYOFkiRJUgYLJUmSpAwWSpIkSRkslCRJkjJYKEmSJGWw\nUJIkScrQ3uoBQwjtwDXA8cAa4N0xxi/U9b8VuATYWV30phjj/a3OU5IkqeWFEnAR8GSM8ddDCJuA\nO4Av1PWfBlwcY7w9h9wkSZIOyKNQ+gzwj9WfS8BEQ/9pwBUhhKcAN8QY39vK5CRJkmqSNE1zGTiE\n0At8HvhIjPH6uuXvBD4IDAL/DHwoxvilOcLlsxFasgcffJC3fulP6TqyZ8b+/Tv38Zcv/1NOPPHE\nZqeSNHsAPE+1dJ6nKoJWnKctk8cVJUIIPwv8E/A39UVS1V/FGAer690APBeYq1Civ39o2fOs6evr\nNX6TxtizZ3je6zR7H7XCcm/Dch/bZpwrRYhZhBxrMVuhyO9Hxs83fm2M1SSPydxHAf8KXBZj/FpD\n33rg7hDCM4D9wIuAj7c6R0mSJMjnitIVwEbgnSGEP6ZymfdjQHeM8eoQwhXA14FR4N9ijF/JIUdJ\nkqTWF0oxxrcCb52l/5PAJ1uXkSRJ0sx84KQkSVKGXCZzSzVTU2VGd49k9o/uHmFqqtzCjCRJOshC\nSTlL2Xf7iYx1bZqxd2L/Hnipn1aWJOXDQkm5amtro6fvJNauP2rG/tHBJ2hra2txVpIkVThHSZIk\nKYOFkiRJUgYLJUmSpAxLmqMUQjgReB7wKeAjVL5u5HdjjN9chtwkSZJytdQrStcC48D5wFbgbcD/\nWmpSkiRJK8FSC6W1McZ/BF4JfDLGeAvQsfS0JEmS8rfUQmkqhPBfqRRKXwwhXABMLT0tSZKk/C21\nUHoj8ArgshjjT4FXAZcsOStJkqQVYEmFUozxLuAPY4yfCyE8H7gFeHBZMpMkScrZkgqlEMKHgT8K\nIZxM5ZNvPw/8/XIkJkmSlLel3nr7ReA3gV8FPh5jfANw3JKzkiRJWgGWWii1VWOcD3w5hLAOWLfk\nrCRJklaApX4p7t8DPwVujTF+O4RwL5UHT2YKIbQD1wDHA2uAd8cYv1DXfx7wTmACuDbGePUSc5Qk\nSVqUpU7mvgp4SoxxW3XR82OMfznHyy4Cnowxng28DPibWke1iLoKeDHwAuCNIYS+peQoSZK0WEv9\nCpOzgMtDCD1AArSFEI6LMR4/y8s+A/xj9ecSlStHNc8E7o8xDlbjfxM4G/jcUvKUVpvXf+qP6Txy\nlCSBNIWxnWu55tf+LO+0BFx209splzlwbEol+OCL3pd3WivCd396O9fe9OlZ1+ksrWGyPEmpVKK7\no5vfPvVNHNV9BPvGh/nkff/I/QMPk5ZTRvdsYOqxp9H29O+Qtk9W/gJR2efpSDtJ1xRJkh5YDpBO\nlCApk8zwly9NK8fs4AIggbScMHbPL1ZyO/k2klJaWa8MaQJpGZhqp9RWpi3tZOr+5zJ15EOUuoZI\n1g5Xxi+XGL3nF2C8m/YT7qbUuxuAY7uPZeddgZHhhDRNaUugvaONpx2zlnUn3sfAxABbujbzqq3b\nYLKD627cQf/Afvo2drHt7BPYfvPDB9oXn7uVnq41iz00msVSb71dDfwF8FrgA1SuEH1/thfEGEcA\nQgi9VAqmP6zrXg/srWs
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd128874208>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# PairGrid of variables\n",
"g = sns.PairGrid(df_clean, hue=\"Survived\", vars=['Pclass', 'Sex', 'Age'])\n",
"g.map_diag(plt.hist)\n",
"g.map_offdiag(plt.scatter)\n",
"g.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can observe, for example, that more women survived as well as more people in 3rd class. \n",
"\n",
"We can represent these findings."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 13,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126d16b70>"
]
},
8 years ago
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAERCAYAAACdPxtnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAG/1JREFUeJzt3Xt8VOW97/HPJAEScsGAsed4NKFS+Hmtr1oV8eVu1ep2\neztSrbtI1YqyqXipivYcr/XSCrYqolgURLTV3QtFKRUVG6tVoT142a2y694/FVQIXsolhSQkMEnm\n/DETOgmErMismUzW9/16+TLPrFnP/HDhfLPWs9bzxBKJBCIiEk0FuS5ARERyRyEgIhJhCgERkQhT\nCIiIRJhCQEQkwhQCIiIRFnoImNloM3txJ6+fbmavmtkyM5sYdh0iIrKjUEPAzL4HPAQM6vJ6ETAd\nOAE4FphkZlVh1iIiIjsK+0zgPeDrO3n9AOBdd9/s7nFgKfCVkGsREZEuQg0Bd18ItO5kUwWwKa3d\nAAwJsxYREdlRrgaGN5MMgg7lwN9zVIuISGQVZelzYl3a/wV8wcz2ALaQvBR0Z0+dtLa2JYqKCkMo\nr39qbm7m9NNPJ5FIEIvFeOqppygpKcl1WSKSfV2/g7fLVggkAMzsHKDU3eea2RTgd6ni5rr7xz11\nUl+/Jdwq+5mGhs10TBCYSCT4+OONlJdX9LCXiPQ3VVXl3W4LPQTc/UPg6NTPv0h7/Wng6bA/X0RE\nuqeHxUREIkwhINKHzZs3h3HjxjJv3pxclyL9lEJApI9qaWmmtvZZAGprl9DS0pzjiqQ/ytbAsPSg\nra2NurrVGe2zqampU3vNmtWUlpZm9DP22aeawkLdsRWGeDyeNrDfTjwep7hYd3dJZikE+oi6utXc\n8uTtlFRm7ku6fVtbp/Z9Sx+kYGDmvrCb65u45cwbqKn5fMb6FJHsUgj0ISWVpZTu2f2tXL3VtrW1\n0xN4g4eVUThIh1xE/kHfCCIiAbW3tzNjxp3U1a2hpaWF6uoarrnmOoqK8verVAPDIiIBLV/+RwCm\nT7+fWbPmMmTIHjz99G9zXNXuUQiIiAS055578eabf2bp0pdpbm5m0qRLOOOMM3n88UeZPPkiJk++\niNdeW05jYyPnnvuvrF+/nueff46pU2/Ndendyt9zGBGRLBs5chSXXnolCxcuYNq0Wzn44C9yzjnn\n8dZbf+GBBx6mubmZSy65iEce+TlXXDGFqVNvpaFhM/fd92CuS++WQkBEJKBVq95j5MhRTJt2F+3t\n7Tz22CPcfvutxGLw3e9eTCKRIB6Ps3nzJo444ihmzryH448/sU9P3KjLQSIiAb322nLmzXsIgIKC\nAkaMGEl1dQ0HHXQI9933IPfc8xOOP/5EyssrWLhwAUcccRR/+tMy1q6ty3Hl3VMIiIgEdNZZ3ySR\nSDBhwnguuWQizzzzFDfddBs1NcO59NJ/Y9KkCxgyZA8++mgtixcv4uKLL+Pyy6cwbdptuS69W7oc\n1I/FCtKmEI91aUtG5eMT33rau/eKioq45pprd3j9ggsmcsEFEzu99vDDjwFw8MGHcP/9fXfuJ4VA\nP1YwoJCyUUNpfGcjZSOHUjBA/8OHJd+e+NbT3tJBIdDPVR65N5VH7p3rMiJBT3xLPtKYgIhIhCkE\nREQiTCEgIhJhusAoIv1OGHdr9de7qRQCItLv1NWt5sYZCyguG5qR/loaN/LDK78Ryt1Uzz67mA8/\n/ICLL74s430HoRAQkX6puGwogyuqcl1GILFY7p7hUQiIiGTIs88uZtmyl9m6dSsbNmzg7LPH8cor\nL/H++yu59NIr+PTTT3n55RdpaWlhyJA9mDr1zk77P/HEr6itfY5YLMYJJ/wzZ531zdBrVgiIiGTQ\nli3NTJ8+k9///nfMn/8LZs9+hP/4j9eZP//n7L//gdx77wMATJlyOf/9329v3++DD97n97+v5YEH\nHiaRSHDVVZdy5JFj2Hff6lDrVQiIiGTQqFEGQFlZOTU1wwEoL68gHm+lsLCIm2++npKSEtav/xut\nra3b91u1aiWffPIxV1wxmUQiQWNjA3V1qxUCIiL5pLvr+62tcZYufYnZsx9h69YWLrroPBKJxPbt\n1dU17LffCO666z4A5s//OSNGjAy9XoWAiPRLLY0b+1RfhYVFFBeXMHnyRQAMG1bF+vXrtm//whdG\ncthhRzB58kXE43EOPPAgqqr22u3P7YlCQET6nX32qeaHV34j43325OSTT9v+8+jRYxg9egyQXJFs\n+vSZPe4/fvx5jB9/3mcv8jNQCIhIv1NYWKgZUgPStBEifZTWg5BsUAiI9FEd60EAWg9CQqPLQSJ9\nmNaDkLDpTEBEJMJ0JiAi/Y5mEQ1OISAi/U6m13wOsiZzW1sbV155Ca2trdx5572UlZVl5LPPOOMk\nFi16LiN97YxCQET6pUyv+dyTdevW0dzczNy5P8twz+HeFaYQEBHJgLvvnkZd3WqmTr2VLVu20NCw\nGYArrriG/fYbwbhxX+eQQw5lzZrVHHbY4TQ1NfL223+lurqGm266jVWrVnL//ffQ3t7Opk1/5+qr\nr+Pggw/Z3v/Kle9x7713AVBRMYTrr/8+gwfv/plOqCFgZjFgFnAo0AJMdPdVadu/BUwBWoFH3P3B\nMOsREQnL1Vdfy803X8/QocM48MCDGTv2LOrq1jB16q3MmjWXjz/+iJkzZ1NZOZRTTvkac+f+lKuu\nGs7ZZ59BU1Mj77+/issuu4r99htBbe0Snnnmt51C4Mc/vp3rr7+ZmprhLF68iMcf/ymTJl2y23WH\nfSYwFhjk7keb2Whgeuq1DncCBwBbgLfN7BfuvinkmkREQrNy5bu88cZrvPBCLYlEYvsZwZAhe2yf\nC6ikpITq6uEAlJeXsW3bNqqqqnj00bkUFxfT1NRIaWnnMYUPP3yfu+++A4DW1lb22WffjNQbdggc\nAywBcPflZnZ4l+1vApVAx1R6CURE8lhNzec56aQDOOGEk6ivr2fx4kUAdJ5c9B9fdYlEgkQiwYwZ\nd3HLLT+kuno4Dz88m08//aTTe6urh3Pjjbey116fY8WKN9m4cUNG6g07BCqA9N/sW82swN3bU+2/\nAm8AjcCT7r455HpEJCKa65uy3lcsFuP88y9k2rTbWLToSbZs2cKFF07q2Jr+zk77xGIx/uVfTuHG\nG/8vFRVDqKrai02b/t7pvVdffS0/+MH3aWtro6CggGuvvWn3/2BALH0+60wzs7uBP7n7glR7tbtX\np34+BJgPHAE0Af8OPOHuT3TXX2trW6KoqP/dpwuwcuVKrn3y9qzezbC7mtY3cMeZNzBixIhcl5Jz\n+Xb8+vuxa2tr44MPPshon8OHD8/n5wS6vcUo7DOBZcBpwAIzOwpYkbZtE8mxgK3unjCzv5G8NNSt\n+votoRWaa/UZ/K0lm+rrm1i3riHXZeRcPh6//n7sKioyOxf/xo35+/1TVdX9Lydhh8BC4EQzW5Zq\nTzCzc4BSd59rZnOApWa2FVgJPBpyPSIikibUEHD3BDC5y8vvpG2fDcwOswYREemeJpATEYkwhYCI\nSIQpBEREIkwhICISYQoBEZEIUwiIiESYQkBEJMIUAiIiEaYQEBGJMIWAiEiEKQRERCJMISAiEmEK\nARGRCFMIiIhEmEJARCTCFAIiIhGmEBARiTCFgIhIhCkEREQiTCEgIhJhCgERkQhTCIiIRJhCQEQk\nwhQCIiIRphAQEYkwhYCISIQVBXmTmZUBxwEjgXbgPeB5d28JsTYREQnZLkPAzAYDNwNnAm8BHwJx\n4GjgHjN7EviBuzeGXaiIiGReT2cCjwNzgOvcvT19g5kVAKel3jM2nPJERCRMPYXAWe6e2NmGVCj8\n1syeynxZIiKSDT2FwE1m1u1Gd7+tu5AQEZG+r6e7g2Kpf0YDZ5EcFN4GnAocFG5pIiIStl2eCbj7\nrQBmtgwY4+5bUu0ZwIvhlyciImEK+pxAFZB+2WcAMDTz5YiISDYFek4AeAh43cyeIRkcpwEzQqtK\nRESyItCZgLvfCZwPfAK
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1296c7dd8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Pclass\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that more women survived in all the passenger classes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to put in practice our knowledge about munging and visualisation. We will analyse every feature of the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Age"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We saw that there are 177 missing values of age. We are going this feature with more detail."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126d55588>"
]
},
8 years ago
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAECCAYAAADw0Rw8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFEVJREFUeJzt3X+M3PV95/Hneu01aDN2VmZiqXJxuFZ9c5WO9EgFBwk2\nRHCBkJYinRSp4lrSq6NGFknLhTviiqinxgG1wU1p1PRknIP0Jy2qe20QSZOQgk3US0g4qb7QN0ZN\nyZlIZOOM2fVSYpbd+2PGYcI57Mx3vrMz++H5kCzN9zvf/cxL4++89rvf+f6YWF5eRpJUrnWjDiBJ\nGi6LXpIKZ9FLUuEsekkqnEUvSYWz6CWpcOt7WSgiLgbuyMwrIuKngE8ALwJPZuYvd5bZBbynM39v\nZj4wpMySpD6suEUfEbcA+4GNnVkfAn4jM3cAZ0XEtRGxFbgJuAS4Grg9IjYMKbMkqQ+97Lp5Cri+\na/px4JyImAAatLfgLwIOZ+ZiZs4BR4EL6g4rSerfikWfmQeBxa5ZR4G7gP8DvAH4O2AT8FzXMieB\nzbWllCRVVuXL2N8F3pKZPwn8IbCPdslv6lqmAZwYPJ4kaVA9fRn7CseB+c7jbwGXAl8B9kbEFHA2\ncD5wZKWBlpeXlycmJipEkKTXtL6Ks0rR7wLui4gXgVPArsx8NiLuAg53AuzJzFMrJp2YYHZ2fqXF\nRq7ZbJizRmsh51rICOas21rK2Y+eij4zn6a95U5mPgq89QzLHAAO9PXq+r6XXnqJY8e+OdAYc3PT\ntFoLA2fZtu1cJicnBx5H0nioskWvITh27Jv8530PMDW9ZaQ5Ti0c586br2X79vNGmkNSfSz6MTI1\nvYWzNm0ddQxJhfESCJJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc7j6PUDlpeXeOaZY0N9\njV7O4PXsXKk+Fr1+wKmFFvvuazE1Pdyyf/UMnp0r1cmi1//HM3SlsriPXpIKZ9FLUuEsekkqnEUv\nSYWz6CWpcD0ddRMRFwN3ZOYVEdEE9gOvByaBX8jMb0TELuA9wIvA3sx8YFihJUm9W3GLPiJuoV3s\nGzuzfgv4o8y8HLgNOD8itgI3AZcAVwO3R8SGoSSWJPWll103TwHXd02/BdgWEZ8Dfh74O+Ai4HBm\nLmbmHHAUuKDmrJKkClYs+sw8CCx2zXoj8N3MvAr4v8CtwCbgua5lTgKb64spSaqqypmxx4G/6Tz+\nG2Av8BXaZX9aAzjRy2DNZqNChNU37Jxzc9NDHX+tmZmZHvm6MerX75U567VWcvajStEfAt4B/DGw\nAzhCu+j3RsQUcDZwfmf+imZn5ytEWF3NZmPoOVe6yNdrTau1MNJ1YzX+z+tgznqtpZz9qHJ45QeA\nX4yIw8DbgY9k5rPAXcBh4PPAnsw8VWFsSVLNetqiz8yngUs7j78J/PszLHMAOFBrOknSwDxhSpIK\nZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAW\nvSQVzqKXpMJZ9JJUOItekgrXU9FHxMUR8cVXzPv5iPhS1/SuiPhKRHwpIq6tO6gkqZoViz4ibgH2\nAxu75v1b4Je6prcCNwGXAFcDt0fEhtrTSpL61ssW/VPA9acnImIL8GHg/V3LXAQczszFzJwDjgIX\n1BlUklTNikWfmQeBRYCIWAfcDdwMLHQttgl4rmv6JLC5vpiSpKrW97n8hcCPA58Azgb+dUTsA75I\nu+xPawAnehmw2Wz0GWE0hp1zbm56qOOvNTMz0yNfN0b9+r0yZ73WSs5+9FP0E5n5GPBvACJiO/Cn\nmXlzZx/9hyNiivYvgPOBI70MOjs732fk1ddsNoaes9VaWHmh15BWa2Gk68Zq/J/XwZz1Wks5+9HP\n4ZXLP+yJzHwWuAs4DHwe2JOZp/pKIkkaip626DPzaeDSV5uXmQeAA7WmkyQNzBOmJKlwFr0kFc6i\nl6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJ\nKpxFL0mFs+glqXA93WEqIi4G7sjMKyLip2jfNnAR+B7wC5k5GxG7gPcALwJ7M/OBYYWWJPVuxS36\niLgF2A9s7Mz6GLA7M98GHAT+a+fm4DcBlwBXA7dHxIbhRJYk9aOXXTdPAdd3Tb8rM/+h83g98AJw\nEXA4Mxczcw44ClxQa1JJUiUrFn1mHqS9m+b09LMAEXEpsBv4HWAT8FzXj50ENteaVJJUSU/76F8p\nIt4FfBB4R2Yej4g52mV/WgM40ctYzWajSoRVN+ycc3PTQx1/rZmZmR75ujHq1++VOeu1VnL2o++i\nj4gbaH/penlmni7zLwMfjogp4GzgfOBIL+PNzs73G2HVNZuNoedstRaGOv5asry8xJEjOdL3ZGZm\nmunpLUxOTo4sQy9WY92sgznr1e8vo76KPiLWAb8LPA0cjIhl4OHM/G8RcRdwGJgA9mTmqb6SSB2n\nFlrsu6/F1PSxEWY4zp03X8v27eeNLINUl56KPjOfBi7tTG75IcscAA7UlEuvcVPTWzhr09ZRx5CK\n4AlTklQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqc\nRS9JhbPoJalwFr0kFc6il6TC9XTjkYi4GLgjM6+IiB8D7gGWgCOZubuzzC7atxh8EdibmQ8MJ7Ik\nqR8rbtFHxC3AfmBjZ9Y+2rcK3Amsi4jrImIrcBNwCXA1cHtEbBhSZklSH3rZdfMUcH3X9Jsz81Dn\n8YPAVcBFwOHMXMzMOeAocEGtSSVJlaxY9Jl5EFjsmjXR9Xge2AQ0gOe65p8ENtcRUJI0mCpfxi51\nPW4AJ4A52oX/yvmSpBHr6cvYV/haROzIzEeAa4CHgK8AeyNiCjgbOB840stgzWajQoTVN+ycc3PT\nQx1f/ZuZmV4T6+dayAjmHKUqRf8BYH/ny9YngPszczki7gIO0961syczT/Uy2OzsfIUIq6vZbAw9\nZ6u1MNTx1b9Wa2Hs18/VWDfrYM569fvLqKeiz8yngUs7j48Cl59hmQPAgb5eXZI0dJ4wJUmFs+gl\nqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIK\nZ9FLUuEsekkqnEUvSYWrcitBImI9cC/wRmAR2AW8BNxD++bhRzJzdz0RJUmDqLpF/w5gMjPfAvwm\n8BFgH+17xe4E1kXEdTVllCQNoGrRPwmsj4gJYDPwInBhZh7qPP8gcGUN+SRJA6q06wY4CZwH/COw\nBfgZ4LKu5+dp/wKQJI1Y1S36XwM+k5kBvAn4FDDV9XwDODFgNklSDapu0X+X9u4aaBf6euDxiNiZ\nmQ8D1wAP9TJQs9moGGF1DTvn3Nz0UMdX/2ZmptfE+rkWMoI5R6lq0X8M+GREPAJsAG4FvgrcHREb\ngCeA+3sZaHZ2vmKE1dNsNoaes9VaGOr46l+rtTD26+dqrJt1MGe9+v1lVKnoM3MBeNcZnrq8yniS\npOHxhClJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4aqeGSsVbXl5iWeeOTbqGABs\n23Yuk5OTo46hNcyil87g1EKLffe1mJoebdmfWjjOnTdfy/bt5400h9Y2i176Iaamt3DWpq2jjiEN\nzH30klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXCVD6+MiFuBn6V9K8HfBx4B7gGWgCOZubuOgJKk\nwVTaoo+IncAlmXkp7dsHngvsA/Zk5k5gXURcV1tKSVJlVXfdvB04EhF/Bfw18Gngwsw81Hn+QeDK\nGvJJkgZUddfNObS34t8J/CvaZd/9S2Me2DxYNElSHaoW/XHgicxcBJ6MiBeAbV3PN4ATvQzUbDYq\nRlhdw845Nzc91PG1ds3
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1283f3198>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Histogram of Age\n",
"# For Series, you can use hist(), plot.hist() or plot(kind='hist')\n",
"df['Age'].hist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the histogram is slightly *right skewed* (*sesgada a la derecha*), so we will replace null values with the median instead of the mean.\n",
"\n",
"In case we have a significant *skewed distribution*, the extreme values in the long tail can have a disproportionately large influence on our model. So, it can be good to transform the variable before building our model to reduce skewness.Taking the natural logarithm or the square root of each point are two simple transformations. "
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126bbcf60>"
]
},
8 years ago
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAECCAYAAAAB2kexAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE/JJREFUeJzt3W+MXfdd5/H3eOxpu7cTY7m30XZN3GxKvxLSbqQUBJTG\ndqJUxYTWgBA8CSLtblErSwHSsmpdpUKqqKpt462goqySpk7RVpRGLUFEboAaJS5IQENAWHW/aUqp\nNeYBgxkz48k2dmZmH9zr5jZr+55z556Zc3/zfkmR5t5zfO4n989nzvzuOb8ztba2hiSpDNs2O4Ak\naXwsdUkqiKUuSQWx1CWpIJa6JBXEUpekgmwftkJEbAceBl4LvAC8E1gBjgGrwKnMPNxcRElSVVX2\n1H8SmM7MHwc+BHwYOAocycz9wLaIONRgRklSRVVK/Rlge0RMATuBS8AtmXmyv/w4cEdD+SRJNQwd\nfgEuADcCXwd2A28Fbh1YvkSv7CVJm6zKnvqvAV/KzABuBj4DzAwsnwXON5BNklRTlT31f6M35AK9\n8t4OPB0R+zPzCeAgcOJaG1hbW1ubmppaV1BJ2oJqF+fUsAm9IqIDPAT8R2AH8HHgKeDB/u3TwDsz\n81obWpufX6qbbcN1u7OYc3zMOV7mHJ9JyAjQ7c7WLvWhe+qZuQz8whUWHaj7YJKkZnnykSQVxFKX\npIJY6pJUEEtdkgpiqUtSQSx1SSqIpS5JBbHUJakglrokFcRSl6SCWOqSVBBLXZIKYqlLUkEsdUkq\niKUuSQWx1CWpIJa6JBXEUpekglS58LQKs7Kywtzcmcrr79lzA9PT0w0mkjQulvoWNDd3hvccfYyZ\nzu6h615cPsf9997J3r03bkAySes1tNQj4peAu4E14BXAzcCtwMeBVeBUZh5uMKMaMNPZzcuvu36z\nY0gas6Fj6pn5cGbelpm3A08B9wAfBI5k5n5gW0QcajinJKmCyl+URsQPAT+YmQ8Cb8jMk/1Fx4E7\nmggnSaqnztEv7wd+4wr3LwE7x5JGkrQulUo9InYCr8/MJ/t3rQ4sngXOjzuYJKm+qke/7AO+PHD7\n6YjY1y/5g8CJYRvodmdHiLfxtkLOxcVOrfV37eqM/Hhb4fncSOYcn0nIOIqqpR7APw7cfi/wQETs\nAE4DjwzbwPz8Uv10G6zbnd0SORcWlmuvP8rjbZXnc6OYc3wmISOM9ounUqln5sdecvsbwIHajyZJ\napTTBEhSQSx1SSqIpS5JBbHUJakglrokFcRSl6SCWOqSVBBLXZIKYqlLUkG88lGLXe2yc4uLnSue\n6t/EZefW1lY5e3au8vpe+k7aXJZ6i7XhsnMXlxc4+rkFZjrDi91L30mbz1JvuTZcdq4NGSRV45i6\nJBXEUpekgljqklQQS12SCmKpS1JBLHVJKoilLkkFsdQlqSCVTj6KiPcBbwN2AL8DPAkcA1aBU5l5\nuKmAkqTqhu6pR8R+4Mcy843AAeAG4ChwJDP3A9si4lCjKSVJlVQZfnkLcCoi/hD4I+CPgVsy82R/\n+XHgjobySZJqqDL88ip6e+c/BfxnesU++MtgCdg5/miSpLqqlPo54HRmvgA8ExHfAfYMLJ8Fzg/b\nSLc7O1rCDdamnIuLnVrr79rVqZS/7nbXk6FNz+e1mHO8JiHnJGQcRZVS/wpwD/C/IuI1QAf4ckTs\nz8wngIPAiWEbmZ9fWlfQjdDtzrYq55XmTB+2fpX8dbc7aoa2PZ9XY87xmoSck5ARRvvFM7TUM/Ox\niLg1Iv4amALeDfwT8GBE7ABOA4/UfmRJ0thVOqQxM993hbsPjDeKJGm9PPlIkgpiqUtSQSx1SSqI\npS5JBbHUJakglrokFcRSl6SCVDpOXeOxsrLC3NyZyuufPTvXYBpJJbLUN9Dc3Bnec/QxZjq7K61/\nYf5ZXtl9XcOpJJXEUt9gM53dvPy66yut+/yFcw2nkVQax9QlqSCWuiQVxFKXpIJY6pJUEEtdkgpi\nqUtSQSx1SSqIpS5JBbHUJakglrokFaTSNAER8RTw7/2b3wI+DBwDVoFTmXm4kXSSpFqGlnpEvAwg\nM28fuO9R4EhmnoyIT0bEocx8tMGcKkzdGSsB9uy5genp6YYSSWWosqd+M9CJiMeBaeADwC2ZebK/\n/DjwZsBSV2V1Z6y8uHyO+++9k717b2w4mTTZqpT6c8BHM/NTEfED9Ep8amD5ErCziXAqW50ZKyVV\nU6XUnwGeBcjMb0TEOeCWgeWzwPlhG+l2Z0cKuNGazLm42Gls2wC7dnUq5W8yx0szXC3PKBmq/v+N\nwvfneE1CzknIOIoqpf4O4L8AhyPiNcB1wJ9ExP7MfAI4CJwYtpH5+aV1Bd0I3e5sozkXFpYb2/bl\n7VfJ32SOwQzXej5HyVD1/6+upl/3cTHn+ExCRhjtF0+VUv8U8OmIOEnvaJe7gXPAgxGxAzgNPFL7\nkSVJYze01DPzEnDXFRYdGHsaSdK6ePKRJBXEUpekgljqklQQS12SClJp7hdps62trXL27Fzl9Z1S\nQFuVpa6JcHF5gaOfW2CmM7zYnVJAW5mlronhtALScI6pS1JBLHVJKoilLkkFsdQlqSCWuiQVxKNf\nNDYvPZZ8cbFz1Sl26xxzLqk6S11jU+dY8gvzz/LK7us2IJW0tVjqGquqx5I/f+HcBqSRth5LvRB1\nTqN36EMql6VeCIc+JIGlXhSHPiR5SKMkFcRSl6SCVBp+iYhXA18F7gBWgGPAKnAqMw83lk6SVMvQ\nPfWI2A78LvBc/66jwJHM3A9si4hDDeaTJNVQZfjlY8AngX8GpoBbMvNkf9lxenvvkqQWuGapR8Td\nwL9k5p/SK/SX/pslYGcz0SRJdQ0bU387sBoRbwZuBj4DdAeWzwLnqzxQtzs7UsCN1mTOxcVOY9vW\n99q1q1PrtfT9OV6TkHMSMo7imqXeHzcHICJOAO8CPhoR+zLzSeAgcKLKA83PL60n54bodmcbzXm1\nya00fgsLy5Vfy6Zf93Ex5/hMQkYY7RfPKCcfvRd4ICJ2AKeBR0bYhiSpAZVLPTNvH7h5YPxRJEnr\n5clHklQQS12SCmKpS1JBLHVJKoilLkkFsdQlqSCWuiQVxFKXpIJY6pJUEEtdkgpiqUtSQSx1SSrI\nKLM0SsVYWVnhm9/8ZuVpkffsuYHp6emGU0mjs9S1pc3NneE9Rx9jprN76LoXl89x/713snfvjRuQ\nTBqNpa4tb6azm5dfd/1mx5DGwjF1SSqIpS5JBbHUJakglrokFcRSl6SCDD36JSK2AQ8AAawC7wKe\nB471b5/KzMMNZpQkVVRlT/2twFpmvgm4D/gwcBQ4kpn7gW0RcajBjJKkioaWemY+Cvxy/+ZeYAG4\nJTNP9u87DtzRTDxJUh2VxtQzczUijgG/BXwWmBpYvATsHH80SVJdlc8ozcy7I+LVwN8ArxhYNAuc\nH/bvu93Z+uk2QZM5Fxc7jW1b32vXrk6l17Lua1J1u03xczQ+k5BxFFW+KL0L2JOZHwG+A6wAX42I\n/Zn5BHAQODFsO/PzS+vN2rhud7bRnFUnjdL6LSwsV3ot674mVbfbhKbfn+MyCTknISOM9ounyp76\nF4BPR8QT/fXvAb4OPBgRO4DTwCO1H1mSNHZDSz0znwN+4QqLDow9jSRpXTz5SJIKYqlLUkEsdUkq\niKUuSQWx1CWpIJa6JBXEa5RKFa2trXL27Fzl9ffsuYHp6ekGE0n/P0tdquji8gJHP7fATGd4sV9c\nPsf9997J3r03bkAy6UWWulTDTGc3L7/u+s2OIV2Vpa7i1BkmqTOc0pSVlRXm5s4MXW9xscPCwrLD\nOromS13FqTNMcmH+WV7Zfd0GpLq6ubkzvOfoY8x0dg9d12EdDWOpq0hVh0mev3BuA9IM57COxmVD\nSv0rf/k3fC2/VWnd61/1Kt74oz/ccCJJKtOGlPqjJ/6ery90K637n3acttQlaUSefCRJBbHUJakg\nlrokFcRSl6SCWOqSVBB
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd126bca358>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We see with more bins the distribution\n",
"df['Age'].hist(bins=30, range=(0, df['Age'].max()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we analyse the relationship of Age and Survived."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<seaborn.axisgrid.FacetGrid at 0x7fd126b0f358>"
]
},
8 years ago
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAFhCAYAAACMIfYoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl4XPWd7/n3qVVbaS/tkmVb8s+yDd5YjAFDDAkhQCAr\nCVlu0uGmk05mOslNbnc6k5mevul7+xkaum9uOumZkKU7nYVAgAQIBMJuA44N3pefZEmWte8qlfZa\nzvxRVbZsy1JJqlJJVd/X8/A8Kp3texDoo9/5LccwTRMhhBAiGpZEFyCEEGLlkNAQQggRNQkNIYQQ\nUZPQEEIIETUJDSGEEFGT0BBCCBE121w7KKUM4PvAZmACuF9r3TRt+13AtwEf8BOt9cNKKQvwQ0AB\nQeALWusTSqktwNNAffjwH2itH43lDQkhhIifOUMDuAdwaq13KqWuBR4Kfw+llC38eTswDuxVSv0W\n2AmYWusblFI3Af89fMx24EGt9T/F/laEEELEWzSPp24AngPQWu8Drpq2rQ5o0FoPa619wB5gl9b6\nt8Dnw/tUA4Phr7cDdyilXlVKPayUyozBPQghhFgi0YRGNuCZ9tkffvw00zYvkAOgtQ4qpX4K/E/g\n5+Ht+4BvaK1vApqAv11w5UIIIZZcNI+nhgHXtM8WrXVw2rbsadtcwFDkg9b6M0qpIuBPSqk64Emt\ndSRkngC+O9uFTdM0DcOIokQhhFj2kuKXWTShsRe4E3hMKbUDODpt20mgRimVC4wBNwIPKKU+CVRo\nrf+BUOd5gFCH+B+UUl/WWh8AbgHenu3ChmHQ2+ud7z0tW263K2nuR+5leUqme4Hkuh+32zX3TitA\nNKHxBPBupdTe8OfPKqU+DmSGR0p9DXieUIr+SGvdqZR6HPiJUurV8DX+Ums9qZT6AvA9pdQU0MX5\nfg8hhBArgLHMV7k1k+WvDEi+v5rkXpafZLoXSK77cbtdSfF4Sib3CSGEiJqEhhBCiKhJaAghhIia\nhIYQQoioSWgIIYSImoSGEEKIqEloCCGEiJqEhhBCiKhJaAghhIiahIYQQoioSWgIIYSImoSGEEKI\nqEloCCGEiJqEhhBCiKhJaAghhIiahIYQQoioSWgIIYSImoSGiAvTNDnePMCBUz34/MFElyOEiJFo\n3hEuxLwca+rnVy+dpqNvFICcTAd37qzmlu0VCa5MCLFYEhoipnqHxvmXJ47hDwS5bmMxrgwHrx/p\n4Ocv1GOzGty0pTzRJQohFkFCQ8RM0DT5ye9PMukLcP+ddezcVArA7u0VfOffDvAfz9dTkp+BqspL\ncKVCiIWSPg0RMy+/086ps0NsrS3kuo0l575flJvOlz6wCYAf/PY445P+RJUohFgkCQ0RE8GgyTNv\nniHNYeXTtykMw7hgu6rK466d1QyPTvH8/tbEFCmEWDQJDRETx5oHGBqZYseGYnKynDPu855rKsnO\ndPDcn87iGZ1a4gqFELEgoSFiYu/RTgCuv7L0svukOWy8//pqJqcCPLW3ealKE0LEkISGWLSRcR8H\nG3opLchgTWn2rPvu2lyGOzeN1w53MDwmrQ0hVhoJDbFo+0504w+Y3HBl6SV9GRezWS3cur0Sf8Bk\nz5HOJapQCBErEhpi0faf7MYwuGDE1Gyuv6IEh93CKwfbCQbNOFcnhIglCQ2xKJNTARo7hqkucZF7\nmQ7wi2Wk2dmxoYQ+zwRHmvrjXKEQIpYkNMSiNLQNEQiarF81vwl7u7eFZoa//E57PMoSQsSJhIZY\nlJMtgwDUzXOWd1WxizVl2Rxr7sczMhmP0oQQcSChIRbl1NlBrBaD2orceR977YZiTBP2n+qJQ2VC\niHiQ0BALNjbh40yXlzVl2Tgd1nkff836IgwD/nRSQkOIlUJCQyyYbh3CNKFunv0ZETlZTtZX5XG6\n3UOfZzzG1Qkh4kFCQyzYuf6MBYYGhB5RgbQ2hFgpJDTEgjW2e7BaDNaUzT4LfDbb1rmxWgz+dKI7\nhpUJIeJlzvdpKKUM4PvAZmACuF9r3TRt+13AtwEf8BOt9cNKKQvwQ0ABQeALWusTSqm1wE/D3zum\ntf5SjO9HLBF/IEhrzyjl7kzstvn3Z0Rkpdupq87jWNMAfZ5xCnPSY1ilECLWomlp3AM4tdY7gW8C\nD0U2KKVs4c+3AjcDn1dKuYG7AFNrfQOhQPn78CEPAX+jtb4JsCil7o7VjYil1dYzgj8QZFWxa9Hn\n2lpTCMDh0zLRT4jlLprQuAF4DkBrvQ+4atq2OqBBaz2stfYBe4BdWuvfAp8P71MNDIW/3q61fj38\n9bOEwkasQE3toR9pVQxCY3M4NA6d7lv0uYQQ8RXN616zAc+0z36llEVrHZxhmxfIAdBaB5VSPyXU\nUvlweLsx076zcbsX/0tpOUmW+2ncewaALeuLF31PbreLNeU56LNDZLrSyEizx6DC+deQLJLpXiD5\n7meliyY0hoHpP7VIYES2Te8FdXG+VYHW+jNKqSLgT0qpDYT6Mmbc93J6e71RlLgyuN2upLmfxnYP\nBpBlt8TknjZV59HU7uHV/We5an3R4guch2T6uSTTvUBy3U+yhF80j6f2Au8DUErtAI5O23YSqFFK\n5SqlHMCNwJtKqU8qpf46vM8EEAj/845Salf4+7cDryNWnKBp0tzhoaQgY0GT+maypVYeUQmxEkQT\nGk8Ak0qpvcCDwFeVUh9XSt2vtfYDXwOeJxQuP9JadwKPA1uVUq8S6rv4S631JPB14O/C57IDj8X+\nlkS89Q6NMzbhj0kneMSqYhe5WQ6ONPYTNGW5dCGWqzkfT2mtTeCLF327ftr2Z4BnLjpmDLh3hnM1\nEBplJVaws90jQGw6wSMMw2Dj6nz2Hu2itXuEVSXJ0ZQXItnI5D4xby1doWfMq4qzYnrejavzATh+\nZiCm5xVCxI6Ehpi31p5QS6Myhi0NgA3V4dBoltAQYrmS0BDz1tk/Sp7LSVZ6bIfGZmc4qCrOoqFt\niElfIKbnFkLEhoSGmJcpX4B+zwQVRfHpc9i4Oh9/wKS+dc7R2EKIBJDQEPPSPTiOCVQUxbY/I2KT\nPKISYlmT0BDz0jUwBkB5nEKjpiIXh80ioSHEMiWhIeals38UiF9Lw26zUFuRQ3vfKMOjU3G5hhBi\n4SQ0xLx09YdaGvHq0wBQVaGXOkm/hhDLj4SGmJfO/jHsNgvu3Pi992J9+E2AJ88Oxu0aQoiFiWbB\nQiGA0JpTnQOjlORnYLEYcx8wC9M0OTFQzyutexiYHMIf8FGSWcSN5dexrrgWp92KPistDSGWGwkN\nEbUh7yRTviClBRmLOk//+AA/OvZzWrytAGTaM7AaVo71n+JY/ykqXeWsrrqaU42jeEanyMl0xKJ8\nIUQMSGiIqHWG+zNK8hceGp2j3fyvgz/EMzXMFvcVvLf6FipdZQC0ett5oeUV3u45TFqhB6N9C/rs\nINfUFcekfiHE4kmfhohaZORUyQJbGl2jPfzTOz/AMzXMB2vu5D9f8alzgQFQ6Srnsxvv4/1r3suE\nOYKzbh9HzrbFpHYhRGxIaIioReZolOZnzvtYX8DHj4//nFHfGPepD3FL1a4Z9zMMg9uqd/PBtXdh\n2H0c8b9I0AzOuK8QYulJaIiodYdDozh//iOnftv0LO0jnVxfdi3Xl1875/67q24gY6KCQHofT59+\ncd7XE0LEh4SGiFrv0ATZmQ7SHPPrCjs10MDLrXsozijiQ7V3RXWMYRhc43o35pST51v/SMdI10JK\nFkLEmISGiEowaNI/PIE7N21+x5lBftPwFAYGn9n4MZzW6EdCbawsZurMRkxMnm5+fr4lCyHiQEJD\nRGXAO0EgaM57Ut/+roN0jHZxTck2qlwV8zp2TVk2eIqwTxVwuPcYLcOt8zpeCBF7EhoiKr1DEwC4\nc6IPDV/Qz9PNz2MzrNyx+j3zvmaaw0ZlkYvx5rUAPNX0h3mfQwgRWxIaIiq9Q+MAFM7j8dSe9rcY\nmBhkV8VOCtLzFnTd2oo
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd126b0f240>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Now we visualise age and survived to see if there is some relationship\n",
"sns.FacetGrid(df, hue=\"Survived\", size=5).map(sns.kdeplot, \"Age\").add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do no observe significant differences."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<seaborn.axisgrid.FacetGrid at 0x7fd126a7d748>"
]
},
8 years ago
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADSCAYAAAAffFTTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFj5JREFUeJzt3X2QXXWZ4PFv541lmk4IpqFAJOqsPlM6pbMwvsAICRaM\nYDmiNdayM8NoXEfASYnjFijE9W0UR0dkXMYVlZclOgirlKglC+4oSoLryogwY9bMQyh5C751TGJ3\nEqCT7t4/zs3k0nTo2+ee23363u+nKlX3nnPv00/OPb96zvmd8/udvomJCSRJqpsFc52AJElTsUBJ\nkmrJAiVJqiULlCSplixQkqRaskBJkmpp0Vwn0Gsi4g3AxRTbvg/4QmZeVkHc84CJzPxcm3G+A7w/\nMzeU+O6zgH8ABoEE/iwz97STj3pLN7ePphh/DezLzL9uJ5de4BnULIqIY4DLgNMy8/eAE4GzI+I1\n7cbOzM+22/gq8GngU5n5AuBu4H1znI/mkW5vHxGxNCKuBv7LXOYxn3gGNbtWUGzzw4CdmbknIt4E\nPA4QEQ8AqzLz4YhYBXwgM09tHLVtB14AXA8clZlvb3zn48CjwLLG39gOPH+K9VcB/x14IbAQ+Fhm\n/s+IWAJcDZwAPAQ8Y3LSEfE64P1A86juzMw/afrMIuAU4KzGouuAOyiOhqVWdG37aDgLuA/4RNkN\n1GssULMoM/8lIr4O/DQi7gG+A3wxM3/a+MjkaT2a3/9zZv5xRAwCd0fEBZk5AbwBeDlwfuPzNwI/\nmmL9fwV+mJlrImIA+D8R8QPgjym6Pl4YEf8e+Jcp8v4q8NVp/nsrgN9k5njj/c+BZ06/VaRCl7cP\nMvMLABHx/ta2iOzim2WZ+ZfASorusJXA9xtHYFD0uR/MDxrfHwLuBU6NiJOLRfnLpvhDwD1TrD8N\nOL/R8DcAh1IcLa4GvtT47v3A9yb/4Yh4XUTcM+nfDZM+NlXu41Mskw6qi9uHSvAMahZFxKuBwzLz\nS8B6YH1E/AXwFoojsAkONMLFk77+WNPrfwD+EzDaeD3Z9VOsXwick5n3NnI5kqK74zyefKAyNjlY\ni0eIQ8DSiOhrHJkeDfxsmu9I/6bL24dK8Axqdu0BPhIRKwEioo+i3/xHjfVDFEdtcOBazlS+TnG9\n5w+Br0yx/mtTrL8d+MvG3z2aoqviWcC3gD+NiL5GXieV+Y9l5j5gI3B2Y9EbgVvLxFLP6tr2oXIs\nULMoM78LfBD4RkRsBn5C8Rt8qPGRDwBXNPq+dzR99Ul975n5OHAn8IOpbuM+yPoPAodGxI8pGt2F\nmfkARVfKSCOXzwI/buO/uBY4LyI2Aa+g6NeXWtID7UMz1OfjNiRJddTSNaiIeBnw0cYtnb8HXAHs\nA54A3piZQxHxVuBcYC9waWbe0qmkJUndb9ouvoi4iGKMwCGNRZ8E1mbmK4GbgXdHxFHA2ykG1p0B\n/E1ETL6IKUlSy1q5BnU/8Pqm92dn5v5+2EUUg+heCtyZmfsycxjYAryo0kwlST1l2gKVmTdTdOft\nf/9LgIg4ieKi+N8BS4HfNH1tFwdGbkuSNGOlxkFFxNnAJcCrM/PXETFMUaT2GwB2ThdnYmJioq/v\n6cbeSfNOZTu07UNdaEY79IwLVEScQ3EzxOrM3F+E7gI+3Ji36lDgd4BN08Xq6+tjaGhkpim0ZHBw\noCOxOxW3k7HnY86djN3pnKti+5j/sc35qbFnYkYFKiIWAP+NYtLEmyNiArgjMz8YEVdQjC3oA9Zl\n5uiMMpEkqUlLBSozH+LACOqnzObb+Mw1wDUV5SVJ6nHOJCFJqiULlCSplixQkqRaskBJkmrJAiVJ\nqiULlCSplixQkqRa8pHvFRkbG2Pr1ofbinHEES+c/kOS1CMsUBXZuvVhLvjcbSwZWFHq+6Mj2/jC\nu/tZuvTIijOTpPnJAlWhJQMrOOTwo+Y6DUnqCl6DkiTVkgVKklRLFihJUi1ZoCRJtWSBkiTVkgVK\nklRLFihJUi21NA4qIl4GfDQzT42I3wauA8aBTZm5tvGZtwLnAnuBSzPzls6kLEnqBdOeQUXERcBV\nwCGNRZcD6zJzFbAgIs6KiKOAtwMnAmcAfxMRizuUsySpB7TSxXc/8Pqm9ydk5sbG61uB04GXAndm\n5r7MHAa2AC+qNFNJUk+ZtkBl5s3AvqZFfU2vR4ClwADwm6blu4BlVSQoSepNZebiG296PQDsBIYp\nCtXk5dMaHBwokUJrWo09NjbGgw8+2NJnh4d/NeXyXbu2t5rW0+rU9qjDdq5T7E7mXCW37fyPbc7l\nlSlQP4qIUzJzA3AmcDvwT8ClEbEEOBT4HWBTK8GGhkZKpDC9wcGBlmM/9NADbc1EDrD751voP/p5\npb+/Xye2x0y2RS/E7nTOVXLbzu/Y5vzU2DNRpkBdCFzVuAliM3BTZk5ExBXAnRRdgOsyc7RE7DnT\n7kzko8PbKsxGktRSgcrMh4CTGq+3AKun+Mw1wDVVJidJ6l0O1JUk1ZIFSpJUSxYoSVItWaAkSbVk\ngZIk1ZIFSpJUSxYoSVItWaAkSbVkgZIk1ZIFSpJUSxYoSVItWaAkSbVkgZIk1ZIFSpJUSxYoSVIt\nWaAkSbVkgZIk1VKZR74TEYuA9cCzgX3AW4Ex4DpgHNiUmWurSVGS1IvKnkG9GliYmX8AfAj4CHA5\nsC4zVwELIuKsinKUJPWgsgXqPmBRRPQBy4C9wPGZubGx/lbgtArykyT1qFJdfMAu4DnAvwLPAP4I\nOLlp/QhF4ZIkqZSyBeqdwG2Z+Z6IeCbwXWBJ0/oBYGcrgQYHB0qmUF3s4eH+juUwU53aHnXYznWK\n3cmcq+S2nf+xzbm8sgVqO0W3HhSFaBFwT0Ssysw7gDOB21sJNDQ0UjKFpzc4ONBy7B07dnckhzI6\nsT1msi16IXanc66S23Z+xzbnp8aeibIF6pPAtRGxAVgMXAzcDVwdEYuBzcBNJWNLklSuQGXmbuDs\nKVatbisbSZIaHKgrSaolC5QkqZYsUJKkWrJASZJqyQIlSaolC5QkqZYsUJKkWrJASZJqyQIlSaol\nC5QkqZYsUJKkWrJASZJqyQIlSaolC5QkqZbKPg9KkkoZGxtj69aHZ/y94eH+pzxc9Nhjj2PhwoVV\npaaasUBJmlVbtz7MBZ+7jSUDK9qKMzqyjSvOPYOVK59TUWaqGwtUTUyMj/PII49w2GHtPX7eI0rN\nB0sGVnDI4UfNdRqqOQtUTezdtZ11X7yrraNKjyjVSa10zU3VDTfZo49urTItdbHSBSoiLgZeCywG\nPg1sAK4DxoFNmbm2igR7iUeVqrOquuZ2/3wL/Uc/r6Ks1M1KFaiIWAWcmJknRUQ/cCFwObAuMzdG\nxJURcVZmfq3KZCXNrSoOokaHt1WUjbpd2dvMXwVsioivAl8HvgEcn5kbG+tvBU6rID9JUo8q28W3\nAjgOeA3wXIoi1VzsRoBl7aUmSeplZQvUr4HNmbkPuC8iHgeObVo/AOxsJdDg4EDJFKqLPTzc37Ec\nZtvy5f1T/r/rsJ3rFLuTOVepTtu2ju3kYPt7GfNxX5uPOc9E2QJ1J3AB8HcRcQzQD3w7IlZl5h3A\nmcDtrQQaGhopmcLTGxwcaDn2dHcdzSc7dux+yv97JttipuZj7E7nXKU6bds6tpOp9vcy5uu+Nh9z\nnolSBSozb4mIkyPiLqAPeBvwIHB1RCwGNgM3lYktSRK0cZt5Zl48xeLV5VORJOkAJ4uVJNWSBUqS\nVEsWKElSLVmgJEm1ZIGSJNWSs5lLmpcmxscrmxn9iCNeWEkcVcsCJWle2rtrOx+7ZTtLBn7RVpzR\nkW184d39LF16ZEWZqSoWKEnzlo+o6W5eg5Ik1ZIFSpJUSxYoSVItWaAkSbVkgZIk1ZIFSpJUS95m\n3kUONnBxeLi/5YfNHXvscSxcuLDq1CRpxixQXaTdgYujI9u44twzWLnyORVnJkkzZ4HqMg5clNQt\n2ipQEXEk8EPgNGAMuA4
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd126ad2f98>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We plot the histogram per age\n",
"g = sns.FacetGrid(df, col='Survived')\n",
"g.map(plt.hist, \"Age\", color=\"steelblue\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that non survived is left skewed. Most children survived."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fd126a9b9e8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd126974a90>], dtype=object)"
]
},
8 years ago
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEFCAYAAADjUZCuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFARJREFUeJzt3WuUXXV5x/HvZJIBOkxCjCMujUQQ+9DWhS60XlAJsqAK\nVFPa1fqiLosXbCvVqmgLuLQXi1or0dJWXOUi2uUFpUWQLLwRC8FltSpV0tIHKEIYrGWMEyYzXIbM\nTF+cHRxCwpy9Z59rvp83mX3O7Oc8Z5/9z2/Ovg7Mz88jSdKyTjcgSeoOBoIkCTAQJEkFA0GSBBgI\nkqSCgSBJAmB5pxvQ44uIU4H3A0PAD4E3ZOZUZ7uSOiciPgHcnJkbO91Lv/EbQheLiCcClwKnZeYv\nAT8C/rqzXUmdERFHRcR1wG93upd+ZSB0t18DvpOZdxTTFwK/28F+pE46k8YfSJ/vdCP9ykDobk8D\n7l4wPQaMRMTBHepH6pjMfEtmfhoY6HQv/cpA6G77+nxm29qFpP2CgdDdtgFPWTC9FpjIzAc61I+k\nPmYgdLevAi+IiGcU078PXNXBfiT1MQOhi2XmOPA64J8j4j+BZwFndbYrqeO8RHOLDHj5a0kSNHli\nWkS8APhgZr4sIp4DXADsAh4CXpuZ4xFxBvAm4GHgvMzc1KqmJUn1W3STUUS8C7gIOKB46KPAmZl5\nAnAl8KcRcSjwFuBFwCuAD0TEita0LElqhWb2IdwOnLZg+tWZeXPx83LgQeD5wI2ZuSszJ4HbgKNr\n7VSS1FKLBkJmXklj89Du6f8DiIhjaZw5+BFgJXDfgtmmgFW1dipJaqlKF7eLiFcD5wCnZOb2iJik\nEQq7jQA7FqszPz8/PzDgSYeqXc+tVI4FtVDTK1bpQIiI19DYeXx8Zu7+T/87wF9FxBBwEHAUsHXR\nLgcGGB/fWbaFvRodHem6Wt3Y0/5Qa3R0pIZu2qvfx0K31urGnlpRq1mlAiEilgF/C9wFXBkR88D1\nmfkXEXEBcCONNDo3M2fK1JYkdVZTgZCZdwHHFpNr9vE7lwCX1NSXJKnNPFNZkgQYCJKkgoEgSQIM\nBElSwUCQJAEGgiSpYCBIkgADQZJUMBAkSYCBIEkqGAiSJMBAkCQVDARJElDxBjn7m9nZWcbGtpWe\nb3JymOHhNQwODragK0mql4HQhLGxbZy1cRNDw3u98vc+zUxv5/x3nMq6dYe3qDNJqo+B0KSh4TUc\nuPLQTrchSS3jPgRJEmAgSJIKBoIkCTAQJEkFA0GSBBgIkqSCgSBJAgwESVLBQJAkAQaCJKlgIEiS\nAANBklRo6uJ2EfEC4IOZ+bKIeAZwGTAHbM3MM4vfOQN4E/AwcF5mbmpNy5KkVlj0G0JEvAu4CDig\neGgjcG5mrgeWRcSGiDgUeAvwIuAVwAciYkWLepYktUAzm4xuB05bMP3czNxS/HwtcBLwfODGzNyV\nmZPAbcDRtXYqSWqpRQMhM68Edi14aGDBzzuBlcAIcN+Cx6eAVXU0KElqjyo3yJlb8PMIsAOYpBEM\nez6+qNHRkQotVKs1OzvLnXfeuWidycl7HzU9NfWzyj2tXj1c23ts57Lqh1q9pluXY7/X6sae6q7V\nrCqB8P2IOC4zbwBOBjYD/w6cFxFDwEHAUcDWZoqNj++s0MJjjY6OLFrrrrt+VOlWmFPjt3Pw6JGV\n+pqYmK7lPTbz/qz18zq9qNuW4/5Qqxt7akWtZlUJhHcCFxU7jW8BrsjM+Yi4ALiRxialczNzpkLt\nlqtyK8yHpra3qBtJ6h5NBUJm3gUcW/x8G3D8Xn7nEuCSOpuTJLWPJ6ZJkgADQZJUMBAkSYCBIEkq\nGAiSJMBAkCQVDARJEmAgSJIKBoIkCTAQJEkFA0GSBBgIkqSCgSBJAgwESVLBQJAkAQaCJKlgIEiS\nAANBklQwECRJgIEgSSoYCJIkwECQJBUMBEkSYCBIkgoGgiQJMBAkSQUDQZIEGAiSpMLyKjNFxHLg\nk8DTgV3AGcAscBkwB2zNzDPraVGS1A5VvyGcAgxm5ouB9wHvBzYC52bmemBZRGyoqUdJUhtUDYRb\ngeURMQCsAh4GjsnMLcXz1wIn1tCfJKlNKm0yAqaAw4H/BtYArwReuuD5nTSCQpLUI6oGwtuBL2fm\nuyPiqcC/AkMLnh8BdjRTaHR0pGIL5WtNTg7X9lrNWr16uLb32M5l1Q+1ek23Lsd+r9WNPdVdq1lV\nA+FnNDYTQeM//uXATRGxPjOvB04GNjdTaHx8Z8UWHm10dGTRWhMT07W8VhkTE9O1vMdm3p+1fl6n\nF3XbctwfanVjT62o1ayqgfBR4NKIuAFYAZwNfA+4OCJWALcAV1SsLUnqgEqBkJnTwKv38tTxS+pG\nktQxnpgmSQIMBElSwUCQJAEGgiSpYCBIkgADQZJUMBAkSYCBIEkqGAiSJMBAkCQVDARJEmAgSJIK\nBoIkCTAQJEkFA0GSBBgIkqSCgSBJAgwESVLBQJAkAQaCJKlgIEiSAANBklRY3ukGJPWm2dlZxsa2\nNf37k5PDTExMPzK9du1hDA4OtqI1VWQgtND8/Bz33DNWaV4Hi7rd2Ng2ztq4iaHhNaXnnZnezvnv\nOJV16w5vQWeqykBooZnpCTZePsHQcLlQcLCoVwwNr+HAlYd2ug3VxEBoMQeMutlim3323MyzUNVv\nv+peBoK0H1vKZp+p8ds5ePTIFnSlTjEQpP1c1W+xD01tb0E36qTKgRARZwOvAlYAHwNuAC4D5oCt\nmXlmHQ1Kktqj0nkIEbEeeFFmHgscDxwGbATOzcz1wLKI2FBbl5Kklqt6YtrLga0R8UXgauAa4JjM\n3FI8fy1wYg39SZLapOomoyfS+Fbw68ARNEJhYbjsBFYtrTVJUjtVDYTtwC2ZuQu4NSIeBNYueH4E\n2NFModHRkYotlK81OTlc22u12urVw495P+1cVv1Qq9d0Yjl2ckzsbR0vo67l1a3rbyfGQtVAuBF4\nK/CRiHgKMAxcFxHrM/N64GRgczOFxsd3Vmzh0UZHRxatta/jqbvRxMT0o95PM++vWf1eq1dDpRPL\nsZNjYs91vIw615VuW39bUatZlQIhMzdFxEsj4jvAAPCHwJ3AxRGxArgFuKJKbUlSZ1Q+7DQzz97L\nw8dXb0WS1Ele/lqSBBgIkqSCgSBJAgwESVLBQJAkAQaCJKlgIEiSAANBklQwECRJgHdMk9QB8/Nz\nS7on8xOe8Cs1dqPdDARJbTczPcHGyycYGi4fCjPT27n0fcOsXPmkFnS2fzMQJHVE1Xs5q3XchyBJ\nAgwESVLBQJAkAQaCJKlgIEiSAANBklQwECRJgIEgSSoYCJIkwECQJBW8dEUX2tuFvyYnh5mYmH7c\n+dauPYzBwcFWtiapjxkIXajKhb9mprdz/jtOZd26w1vYmaR+ZiB0KS/8Jand3IcgSQIMBElSwUCQ\nJAFL3IcQEU8CvgucCMwClwFzwNbMPHPJ3UmS2qbyN4SIWA58HLi/eGgjcG5mrgeWRcSGGvqTJLXJ\nUjYZfRi4EPgxMAAck5lbiueupfGtQZLUIyoFQkScDtybmV+jEQZ71toJrFpaa5Kkdqq6D+F1wFxE\nnAQ8G/gUMLrg+RFgRzOFRkdHKrZQvtbk5HBtr9WNVq8ebnp5tnO5d6pWr+nEcuzlMVHX8urW9bcT\nY6FSIBT7CQCIiM3AHwB/ExHHZeYNwMnA5mZqjY/vrNLCY4yOjixaa7FLP/S6iYnpppZnM8uqWd1Y\nq1dDpRPLsZfHRF3rSretv62o1aw6z1R+J3BRRKwAbgGuqLG2FrG36x/ty57XRfIaSJKghkDIzBMW\nTB6/1Hqqpsr1jxrzeQ0kSQ1ey6iPeP0jSUvhmcqSJMBvCFLX+PwXN/HwrrlK8x73wmN42tqn1tyR\n9jcGgtQlNn//HmZ+4chK8w4f+EMDQUvmJiNJEmAgSJIKBoIkCTAQJEkFA0GSBBgIkqSCgSBJAgwE\nSVLBQJAkAQaCJKlgIEi
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd127eceb00>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Alternative to Seaborn with matplotlib integrated in pandas\n",
"df.hist(column='Age', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fd12688a2b0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd12681c9e8>], dtype=object)"
]
},
8 years ago
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEFCAYAAADkP4z+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEyRJREFUeJzt3X1wZXV9x/F3sksAQ5YN9MrUrqwI+tXWVivOYC0FFGxF\n6iCdsc60tEWxVIvKk8wIitIHO/Zh16pUqjz6h9UCQm1hKJ0pjDKOgCCt2sIXEMoSqp2wzbLZLGxg\nk/6RuzXj7Cb3ntyTe5Lf+/XP5t6b3z2f+/D77Mm555w7MDs7iySpHIP9DiBJWl4WvyQVxuKXpMJY\n/JJUGItfkgpj8UtSYdb2O4B+LCJOAf4MGAK+C5yZmTv6m0rqn4i4BvheZm7ud5bVxDX+hoiInwKu\nBk7LzFcCjwF/3t9UUn9ExCsi4l+Bd/Q7y2pk8TfHrwL3ZOaj7cuXA7/dxzxSP53N3IrQdf0OshpZ\n/M3xYuCJeZfHgJGIOKhPeaS+ycwPZOaXgIF+Z1mNLP7m2NdrsXtZU0ha9Sz+5tgCvGje5Q3ARGY+\n06c8klYpi785/gU4JiKObF/+A+BrfcwjaZWy+BsiM8eBdwFfjYj/AF4FXNDfVFLfefrgGgx4WmZJ\nKsuiB3BFxCBwBRDADPBe5g4wuhl4qP1rl2fm9XWFlCT1TidH7r4NmM3MYyPieOaOLP0nYFNmfqrW\ndJKknutoU09EDGbmTET8HnAC8AxzfwGsBR4GzsnMqTqDSpJ6o6MPd9ulfy3waeBLwN3AhzLzeOBR\n4NK6AkqSeqvjk7Rl5hkR8ULgHuCXMvOH7ZtuAj6z0NjZ2dnZgQEPwFMtVtwby/mgmnT8purkw93T\ngQ2Z+UngWeY+4L0xIj6Ymd8GTgTuWzDNwADj45OdZlo2rdaIuTrUxEwwl2ulaeJ8aPLr27RcTcwE\n3c2FTtb4bwSuiYivt3//HObOKXNZREwDPwLOqpBTktQHixZ/Zu4E3rmXm47tfRxJUt08cleSCmPx\nS1JhLH5JKozFL0mFsfglqTAWvyQVxuKXpMJ0fMoGaSl2797N2NiWrsdt2HA4a9asqSGRVC6LX8ti\nbGwLF2y+haHhQzseMz21lU3nn8LGjUfUmEwqj8WvZTM0fCgHrDus3zGk4rmNX5IKY/FLUmEsfkkq\njMUvSYWx+CWpMBa/JBXG4pekwlj8klQYi1+SCmPxS1JhLH5JKozFL0mFsfglqTCLnp0zIgaBK4AA\nZoD3AruAa9uXv5+ZZ9eYUZLUQ52s8b8NmM3MY4FLgD8DNgMXZ+bxwGBEnFpjRklSDy1a/Jn5NeCs\n9sWNwATw2sy8s33drcBJ9cSTJPVaR1/EkpkzEXEt8HbgHcCb5908CRy82H20WiNV8tXOXJ1bSqbt\n24crjRsdHW7kc7FUTXxMTcwEzczVxEzd6PgbuDLzjIh4IfBt4MB5N40A2xYbPz4+2X26mrVaI+bq\n0FIzTUxMVR630HJX6gRcba9vXZqYq4mZoLu5sOimnog4PSI+3L74LLAbuDcijm9fdzJw514HS5Ia\np5M1/huBayLi6+3f/yDwIHBlROwHPADcUF9ESVIvLVr8mbkTeOdebjqh52kkSbXzAC5JKozFL0mF\nsfglqTAWvyQVxuKXpMJY/JJUGItfkgpj8UtSYSx+SSqMxS9JhbH4JakwFr8kFcbil6TCWPySVBiL\nX5IKY/FLUmEsfkkqjMUvSYWx+CWpMBa/JBXG4pekwlj8klSYtQvdGBFrgauBlwBDwCeAJ4CbgYfa\nv3Z5Zl5fY0ZJUg8tWPzA6cBTmfm7ETEK/BvwR8CmzPxU7ekkST23WPFfB+xZmx8EngOOBl4REW8H\nHgbOycyp+iJKknppwW38mbkzM6ciYoS5/wA+CtwDfCgzjwceBS6tPaUkqWcWW+MnIl4M3Ahclplf\niYiDM/Pp9s03AZ/pZEGt1kj1lDUyV+eWkmn79uFK40ZHhxv5XCxVEx9TEzNBM3M1MVM3Fvtw9zDg\nNuDszLyjffVtEfH+zLwXOBG4r5MFjY9PLiloHVqtEXN1aKmZJiaqbQ2cmJhacLkrdQKutte3Lk3M\n1cRM0N1cWGyN/yJgPXBJRHwMmAXOA/46IqaBHwFnVcwpSeqDBYs/M88Fzt3LTcfWE0eSVDcP4JKk\nwlj8klQYi1+SCmPxS1JhLH5JKozFL0mFsfglqTAWvyQVxuKXpMJY/JJUGItfkgpj8UtSYSx+SSqM\nxS9JhbH4JakwFr8kFcbil6TCWPySVBiLX5IKY/FLUmEsfkkqjMUvSYWx+CWpMGsXujEi1gJXAy8B\nhoBPAP8JXAvMAN/PzLPrjShJ6qXF1vhPB57KzOOAtwCXAZuBizPzeGAwIk6tOaMkqYcWK/7rgEva\nP68Bngdem5l3tq+7FTippmySpBosuKknM3cCRMQIcD3wEeCv5v3KJHBwbekkST23YPEDRMSLgRuB\nyzLzKxHxF/NuHgG2dbKgVmukWsKamatzS8m0fftwpXGjo8ONfC6WqomPqYmZoJm5mpipG4t9uHsY\ncBtwdmbe0b76/og4LjO/AZwM3N7JgsbHJ5cUtA6t1oi5OrTUTBMTU5XHLbTclToBV9vrW5cm5mpi\nJuhuLiy2xn8RsB64JCI+BswC5wCfjYj9gAeAGyrmlCT1wWLb+M8Fzt3LTSfUkkaSVDsP4JKkwlj8\nklQYi1+SCmPxS1JhLH5JKozFL0mFsfglqTCLnrJBmm/37t2MjW3petyTT47VkEZaearOIYANGw5n\nzZo1S85g8asrY2NbuGDzLQwNH9rVuB3jj3BQ66iaUkkrR9U5ND21lU3nn8LGjUcsOYPFr64NDR/K\nAesO62rMrh1ba0ojrTxV5lAvuY1fkgpj8UtSYSx+SSqMxS9JhbH4JakwFr8kFcbil6TCWPySVBiL\nX5IKY/FLUmEsfkkqjMUvSYWx+CWpMB2dnTMijgE+mZlvjIjXADcDD7Vvvjwzr68roCSptxYt/oi4\nEPgdYEf7qqOBTZn5qTqDSZLq0cmmnkeA0+ZdPho4JSK+HhFXRsRwPdEkSXVYdI0/M2+KiI3zrrob\nuCIz74+Ii4FLgQtryietWtPT0zz62GNdjxschJcd9XIGBgZqSKUSVPkGrn/IzKfbP98EfKaTQa3W\nSIVF1c9cnWu1Rti+fXn/wBsdHW7kc7FUrdYId91zLx+/8pus3f+g7gbvHOOrnwtGRnr7vDT1eW5i\nrqVkWsoc6tV8qFL8t0XE+zPzXuBE4L5OBo2PT1ZYVL1arRFzdWhPpomJqWVd7sTE1ILPRRNLoRPj\n45Ns27aToReMst+BB3c1dmZmkqee2sGzz/YuTxPfc9DMXEvNtJQ5tNB86GYuVCn+9wGfjYhp4EfA\nWRXuQ5LUJx0Vf2Y+Dryh/fP9wLF1hpIk1ccDuCSpMBa/JBXG4pekwlj8klQYi1+SCmPxS1JhLH5J\nKozFL0mFsfglqTAWvyQVxuKXpMJY/JJUGItfkgpT5bTMkvpodnaGLVseZ3j4BV2P3bDhcNasWVND\nqvLs3r2bsbEtXY978smxGtJ0x+KXVphdU9v42OfvYGj40K7GTU9tZdP5p7Bx4xE1JSvL2NgWLth8\nS9evw47xRziodVRNqTpj8Usr0NDwoRyw7rB+xyhelddh146tNaXpnNv4JakwFr8kFcbil6TCWPyS\nVBiLX5IKY/FLUmEsfkkqTEf78UfEMcAnM/ONEXEkcC0wA3w/M8+uMZ8kqccWXeOPiAuBK4D921dt\nBi7OzOOBwYg4tcZ8kqQe62RTzyPAafMuH52Zd7Z/vhU4qeepJEm1WbT4M/Mm4Pl5Vw3M+3kSOLjX\noSRJ9alyrp6ZeT+PANs6GdRqjVRYVP3M1blWa4Tt24eXdZmjo8ONfC6WqtUaYf367s+uCTA4OLD4\nL+3DQs9nU5/nJubqx1yA3s2HKsX/nYg4LjO/AZwM3N7JoPHxyQqLqlerNWKuDu3JNDExtazLnZiY\nWvC5aGIpdGJ8fJJt23Z
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1268b0d30>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We can observe the detail for children\n",
"df[df.Age < 20].hist(column='Age', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"0.4817073170731707"
]
},
8 years ago
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Mean of survival for young\n",
"df[df.Age < 20]['Survived'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There were null values, we will recap at the end of this notebook how to manage them.\n",
"\n",
"We are going now to see the distribution of passengers younger than 20 that survived."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd1267abcc0>"
]
},
8 years ago
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEmCAYAAACtaxGwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFAFJREFUeJzt3Xtw1eWdx/F3LpJAjBogoIhQ6jiPbd3VFTu2SkVFrBbH\nXndsXbXVeqn3C7AVHVxwK9gWpYotaL3W6rrV9bKCSKdTd+24Hbe1di1u9/FaaFBKhECQyC1k/8hB\n0xYSODk5vzznvF8zDCfnnN/zfGfyyydPnt/ze05FR0cHkqS0VGZdgCRp9xnekpQgw1uSEmR4S1KC\nDG9JSpDhLUkJqu7uxRBCNXA38CFgAHAD8EdgIfBK7m3zY4wP92GNkqS/0G14A2cA78QYzwohNAC/\nBWYCN8UY5/Z5dZKkHeopvH8CbB9VVwJbgLHAwSGEzwGvApfHGDf0XYmSpL9UsSt3WIYQ6oEngDuA\nGuClGOOLIYRrgIYY49S+LVOS1FVPI29CCAcAjwK3xRgfCiHsHWNcl3v5MeDWntrYurW9o7q6qneV\nSlL5qdjZCz1dsBwOLAEujjE+k3t6SQjhkhjjr4EJwAs99d7S0rYbtaonjY31NDevz7oMaYc8Pwun\nsbF+p6/1NPKeBuwDTA8hXAd0AFcC3wshbAZWAucXqE5J0i7apTnv3mpuXu/WhQXkyEb9medn4TQ2\n1u902sSbdCQpQYa3JCXI8JakBBnekpSgHtd5qzS0t7fT1LS8qH2OHDmKqirX90t9wfAuE01Ny3l+\nypUMq6kpSn+rNm2COXMZPXpMUfqTyo3hXUaG1dQwonZg1mVI/caLL77AdddNY8yYDwOwefNmJk78\nNF/84ml/9d5LL72AqVOvYdSo0cUuc4cMb0llbezYjzNjxg0AbNmyhdNP/yInnTSJuro9M66se4a3\npLLW9UbFDRs2UFVVxauvvsKCBbfR0dFBY2Mj06f/8/vvaW5exZw5s9myZQurV7/DeeddyLhx47n9\n9u/z29++QHv7No499nhOP/0sHn30YZ5+ehFVVZUcfPDHuPzyyQWr2/CWVNZ+85tfc9ll36CiooLq\n6j244oqp3HLLHGbOnM2oUaNZtOjfWbbsTSoqOm92XLbsD3zlK2dy2GGHs3TpS9x99x2MGzeen/3s\np8ybdztDhgxh8eKFACxevJDJk6/m4IM/wuOP/xvbtm2jsrIwi/wMb0llreu0yXazZ1///tz2pEmn\nAh+M0IcMGcp9993FwoVPALB161YArrvueubPv5WWljV84hNHATBt2nU89NCPefvttzjkkL+lkNuR\nGN6S9BeGDm1kxYom9t9/JA88cB8HHDA6N/Lu4M4753PqqV/gyCM/yVNPPcnixQvZunUrzzzzM2bO\nnAXAGWf8PRMmnMiTTz7O1KnXsMcee3DVVZeydOlLHHro3xWkRsNbkv7C1KnXMGvWTCorKxkyZCin\nnfYPPPLIQ0AFxx13ArfdNpf777+HYcOGs27dWqqrq9lrr705//yvUVNTw5FHfpLhw/flwAMP5KKL\nvs6gQXU0Ng7jox89pGA1uqtggvLZtW3Zsjd589qri7ZU8K2N7zHmhhtd512G3FWwcNxVUJJKjOEt\nSQkyvCUpQYa3JCXI1SaSykpf7LCZxQ6ahrekstLUtJzJNy9iQN2QgrS3ecNqbrpqUtFXVhneksrO\ngLoh1O41vOj9vvzyUhYsmMe8ebf3ui3DW5KK4MEHf8SSJU8xcOCggrTnBUtJKoL99z+AWbPmFKw9\nw1uSimD8+OMKelHT8JakBDnnLansbN6wOrO2CrWflOEtqayMHDmKm66aVPA2d9X2D3XoLcNbUlmp\nqqrKbLfLfffdjwUL7i5IW855S1KCDG9JSpDhLUkJMrwlKUFesJRUVtxVUJIS1NS0nGmLZlI7uDB7\njGxc08bsSf/kroKS1NdqBw9i4LA9i9bf1q1bmT37elaufJstW7Zw1lnnMG7cMb1q0/CWpD72058u\nZp999mH69OtpbW3l7LNPN7wlqb87/viJHHfcCQB0dGyjurr30dttCyGEauBu4EPAAOAG4H+Be4Ft\nwNIY48W9rkKSSlhtbS0AbW0bmD79as4//6Jet9nTUsEzgHdijMcAJwG3ATcD18QYxwOVIYTP9roK\nSSpxf/rTSi677EJOPvkUJkw4sdft9TR2/wnwcO5xFbAVODzG+Ivcc4uBicATva5Ekopk45q2ora1\nZs1qJk++lKuu+iaHH35EQfrtNrxjjG0AIYR6OkP8WqDrR0GsB/YuSCWSVAQjR45i9qR/Knib3bn/\n/ntZv3499957J/fc80MqKiqYM+dWBgwYkHefPc6ahxAOAB4FbosxPhRC+E6Xl+uBtT210dAwiOrq\n4i5gL3WNjfW79f7W1jre7KNadqahoW6361Rp6O/f93333aeo/X3rWzOAGQVts6cLlsOBJcDFMcZn\nck+/GEI4Jsb4LHAy8POeOmlpKdyfKOr8wWhuXr9bx7S0bOijarrvc3frVPryOT+1Y939Euxp5D0N\n2AeYHkK4DugALgfmhRD2AH4PPFKgOiVJu6inOe8rgCt28NKxfVKNJGmXuKugJCXIOywllRV3FZSk\nBDU1Lef5KVcyrKamIO2t2rQJ5sx1V0FJ6mvDamoYUTuwaP1t27aNb3/7WyxfvozKykqmTJnGmDEf\n7lWbznlLUh977rlnqaioYP78uzj33G9wxx3f73WbjrwlqY996lPHcvTRnVvArlz5NvX1e/W6TUfe\nklQElZWV3HDDDG65ZQ4TJ57U6/YceUtSkVx77QxaWtZw3nlf5YEHHqampjbvtgxvSWVn1aZNBW2r\np3UmS5Y8xapVqzjzzK8xYMAAKisrqajo3cSH4S2prIwcOQrmzC1Ye2PoeVfB8eOPZ9asmVxyyfm0\nt2/l8sun9GpHQTC8JZWZqqqqoq/Jrq2t5frrZxe0TS9YSlKCDG9JSpDhLUkJMrwlKUGGtyQlyPCW\npAQZ3pKUIMNbkhJkeEtSggxvSUqQ4S1JCTK8JSlBhrckJcjwlqQEGd6SlCDDW5ISZHhLUoIMb0lK\nkOEtSQkyvCUpQX4AcUba29tpalqe17GtrXW0tGzYrWNWrGjKqy9J/ZPhnZGmpuVMWzST2sGDitLf\nujdWc2lRepJUDIZ3hmoHD2LgsD2L0tfGNW3Ae0XpS1Lfc85bkhJkeEtSgnZp2iSEcCRwY4zxuBDC\nYcBC4JXcy/NjjA/3VYGSpL/WY3iHEKYCZwLv5p4aC9wUY5zbl4VJknZuV6ZNXgM+3+XrscCkEMJ/\nhhDuDCHU9U1pkqSd6TG8Y4yPAVu7PPU8MDXGOB54A5jRN6VJknYmnwuWj8cYX8w9fgw4rID1SJJ2\nQT7rvJeEEC6JMf4amAC80NMBDQ2DqK6uyqOr0tXaWvqzTQ0NdTQ21mddhjLg973v5RPeFwLzQgib\ngZXA+T0d0NLSlkc3pW13b29PUUvLBpqb12ddhoqssbHe73uBdPdLcJfCO8a4DDgq9/hFYFxBKpMk\n5cWbdCQpQYa3JCXI8JakBBnekpQgw1uSEmR4S1KCDG9JSpDhLUkJMrwlKUGGtyQlyPCWpAQZ3pKU\nIMNbkhJkeEtSgvLZz7sktbe309S0vGj9rVjRVLS+JJUewzunqWk5k29exIC6IUXp793m1xg6oShd\nSSpBhncXA+qGULvX8KL0tend1cDbRelLUulxzluSEmR4S1KCDG9JSpDhLUkJMrwlKUGGtyQlyKWC\nkjJX7JvkRo4cRVVVVdH66wuGt6TMNTUt5/kpVzKspqbP+1q1aRPMmcvo0WP6vK++ZHhL6heG1dQw\nonZg1mUkwzlvSUqQ4S1JCTK8JSlBhrckJcjwlqQEGd6SlCDDW5ISZHhLUoIMb0lKkOEtSQkyvCUp\nQbu0t0kI4UjgxhjjcSGEA4F7gW3A0hjjxX1YnyRpB3oceYcQpgI/BLZv93UzcE2McTxQGUL4bB/W\nJ0nagV2ZNnkN+HyXr8f
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd126a9e2b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Pclass']).plot(kind='bar')"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd12672e4a8>"
]
},
8 years ago
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAE4CAYAAACUt3JbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFchJREFUeJzt3XuUXWV5x/FvMmOCjIOOcRaVpkSL+nhrRbHVZakI3sVr\nL7Z2KW3RQhGrFqRVqLd6Ka0mtGpFFPFuLWjRKkXaChW1Ld6waIsPoi1hKmjAwQlJNTJJ/9gnrjFk\nMsmZfc4+75vvZy0Wc67v88xMfmfPu/e794odO3YgSSrLyq4LkCTtO8NbkgpkeEtSgQxvSSqQ4S1J\nBTK8JalA43t6MCLGgfOAewCrgNcB1wOfAK7pPe3szLxggDVKknaxx/AGng3clJnHRcQU8BXg1cD6\nzDxr4NVJknZrqfA+H9i5Vb0S+BFwBHDfiHg68A3gRZm5ZXAlSpJ2tWJvVlhGxCTwMeDtwGrgqsy8\nMiJOB6Yy87TBlilJWmjJHZYR8TPApcB7MvNDwEcz88rewxcChw+wPknSbiy1w/Jg4BLg5My8rHf3\nJRHxgsz8IvBo4EtLDXLbbfM7xsfHll2sJO1nViz6wJ6mTSLiL4FnAl/vvckO4AzgDcA24EbghMy8\ndU+jb9q0eahnv5qenmTTps3DHHKo7K9s9leuYfc2PT25aHjvccs7M18MvHg3Dx253KIkSf1zkY4k\nFcjwlqQCGd6SVCDDW5IKZHhLUoGWWh4v9WV+fp6ZmY19vXZuboLZ2X0/48LatYcyNuZ6Au0fDG8N\nxMzMRk7dcBGrJtYMZbxtW25m/SnHsm7dPYcyntQ1w1sDs2piDQccdHDXZUhVcs5bkgpkeEtSgQxv\nSSqQ4S1JBTK8JalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8Jak\nAhneklQgw1uSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ\n4S1JBTK8JalA43t6MCLGgfOAewCrgNcB/wW8G9gOfC0zTx5siZKkXS215f1s4KbMfCTwBOAtwAbg\n9Mw8ClgZEU8bcI2SpF0sFd7nAy/vfT0G3AY8JDM/07vvYuAxA6pNkrSIPU6bZOZWgIiYBC4AzgDe\nuOApm4E7D6w6SdJu7TG8ASLiZ4C/A96SmR+KiL9Y8PAkcMtS7zE1dSDj42P9V9mH6enJoY43bKPe\n39zcxNDHnJqaGPnvy06l1Nmvmvsbld6W2mF5MHAJcHJmXta7+8qIeGRmXg48Ebh0qUFmZ7cuu9B9\nMT09yaZNm4c65jCV0N/s7JZOxhz17wuU8fNbjpr7G3Zve/qgWGrL+2XAXYCXR8QrgB3Ai4A3R8Qd\ngKuBD7dUpyRpLy015/1i4MW7eehRA6lGkrRXXKQjSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4\nS1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAhrck\nFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4S1KB\nxrsuYH81Pz/PzMzGvl47NzfB7OyWfX7d2rWHMjY21teYkkaL4d2RmZmNnLrhIlZNrBnKeNu23Mz6\nU45l3bp7DmU8SYNleHdo1cQaDjjo4K7LkFQg57wlqUCGtyQVaK+mTSLiYcCZmXl0RBwOfAK4pvfw\n2Zl5waAKlCTd3pLhHRGnAc8Bbu3ddQSwPjPPGmRhkqTF7c20ybXAMxbcPgI4NiI+HRHnRsTEYEqT\nJC1myfDOzAuB2xbcdQVwWmYeBXwLeNVgSpMkLaafHZYfzcwre19fCBzeYj2SpL3Qz3Hel0TECzLz\ni8CjgS8t9YKpqQMZHx/uyr7p6cmhjrev5uaGP9s0NTUxtO9L7f0tVyl19qvm/kalt37C+yTgzRGx\nDbgROGGpF8zObu1jmP5NT0+yadPmoY65r/pZ3t7GmMP6vtTe33KU8Pu5HDX3N+ze9vRBsVfhnZnX\nAY/ofX0lcGQrlUmS+uIiHUkqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4S1KBDG9JKpDh\nLUkFMrwlqUCGtyQVyPCWpAIZ3pJUoH7O5z0U8/PzzMxs7Ou1c3MTfZ1Peu3aQxkbG+5FIySpHyMb\n3jMzGzl1w0WsmlgzlPG2bbmZ9accy7p19xzKeJK0HCMb3gCrJtZwwEEHd12GJI0c57wlqUCGtyQV\nyPCWpAIZ3pJUIMNbkgpkeEtSgUb6UEFJalstCwANb0n7lVoWABrekvY7NSwAdM5bkgpkeEtSgQxv\nSSqQ4S1JBTK8JalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kq0F6d2yQiHgacmZlHR8RhwLuB\n7cDXMvPkAdYnSdqNJbe8I+I04B3A6t5dG4DTM/MoYGVEPG2A9UmSdmNvpk2uBZ6x4PYRmfmZ3tcX\nA49pvSpJ0h4tGd6ZeSFw24K7Viz4ejNw57aLkiTtWT/n896+4OtJ4JaWapE0Imq52kzN+gnvL0fE\nIzPzcuCJwKVLvWBq6kDGx/fthzI3N9FHacszNTXB9PTkUMayv/YNs7/lGvU6v/nNbw79ajPnveaZ\nHHbYYQMfq5bfzX7C+yXAOyLiDsDVwIeXesHs7NZ9HqSfT+7lmp3dwqZNm4c21rDZ32iYnp4c+Tpn\nZ7cM/Wozw/r5lfS7uafA36vwzszrgEf0vv4G8Kh9rkKS1BoX6UhSgQxvSSqQ4S1JBTK8JalAhrck\nFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCmR4S1KB\nDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAhrckFcjw\nlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBVovN8XRsSXgO/3bv53Zj63nZIkSUvp\nK7wjYjVAZh7TbjmSpL3R75b3g4CJiLgEGAPOyMwr2itLkrQn/c55bwXekJmPB04CPhARzp9L0pD0\nu+V9DXAtQGZ+IyJuBu4O/O/unjw1dSDj42P7NMDc3ESfpfVvamqC6enJoYxlf+0bZn/LNep11vzz\nq6W3fsP7eODngJMj4hBgErhhsSfPzm7d5wFmZ7f0WVr/Zme3sGnT5qGNNWz2NxqmpydHvs6af34l\n9banwO83vN8JvCsiPgNsB47PzO19vpckaR/1Fd6Z+SPg2S3XIknaS+5klKQCGd6SVCDDW5IKZHhL\nUoH6PreJtD+bn59nZmZjX6+dm5vo63C1tWsPZWxs39ZLqF6Gt9SHmZmNnLrhIlZNrBnKeNu23Mz6\nU45l3bp7DmU8jT7DW+rTqok1HHDQwV2Xof2Uc96SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3\nJBXI8JakAhneklQgw1uSCmR4S1KBDG9JKpDhLUkFMrwlqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtS\ngQxvSSqQ4S1JBTK8JalAhrckFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUa7+dFEbECeCvw\nIOAHwPMy81ttFiZJWly/W95PB1Zn5iOAlwEb2itJkrSUfsP7SOCTAJl5BfDQ1iqSJC2p3/A+CPj+\ngtu3RYTz55I0JH3NeQNzwOSC2yszc3sL9fyEbVtubvstR2KsLsa0v7LHtL9yxxvUWCt27Nixzy+K\niF8BnpyZx0fEw4GXZ+axrVcnSdqtfre8LwQeGxGf693+3ZbqkSTthb62vCVJ3XInoyQVyPCWpAIZ\n3pJUIMNbkgpkeEtSgfo9VHCk9E6UdSzwKGAN8F3gU8A/ZWbxh9PU3F/NvS0UEQ+k119mXt11PW2z\nv+Er/lDBiDgGOAP4MvB
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd12668b320>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Passengers older than 25 that survived grouped by Sex\n",
"\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().plot(kind='bar')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to improve it a bit."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126985cc0>"
]
},
8 years ago
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAENCAYAAADAAORFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE2ZJREFUeJzt3Xtw3WWdx/F3mljaJik0JWXcKQ1Y5YsXvAALLOOILDqo\nLIuIrg6LdeUmpUiRygrlUpVbkbZy01ILFPGyCtVap8hWBlhA3AWLF1D0gV5oG4SalkhrQmmTZv9I\nwABtk56enJPn9P2a6cw553d5vmee5NMnz/n9nlPV1dWFJCkvQ8pdgCRpxxnekpQhw1uSMmR4S1KG\nDG9JypDhLUkZqtnexoioAW4B9gGGApcDq4FFwJM9u81OKd0xgDVKkl5ju+ENnASsTSlNiIhRwG+B\nrwAzU0pfH/DqJElb1Vd43w68PKoeAmwGDgL2j4iPAk8Bk1NKbQNXoiTptar6c4dlRNQDC4FvAbsB\nj6WUfhMRU4FRKaXzBrZMSVJvfY28iYi9gR8DN6SUfhARu6eUXujZvAC4rq9zdHR0dtXUVO9cpZK0\n66na1oa+PrDcC1gMTEop3dfz8uKIOCultAQ4Cni0r9ZbW9t3oNb8NDbW09KyodxlqED2X74qve8a\nG+u3ua2vkfcFwB7AxRFxCdAFfAG4JiI2Ac8BpxepTklSP203vFNK5wDnbGXTewemHElSf3iTjiRl\nyPCWpAwZ3pKUIcNbkjLU53XektRfnZ2dNDevKll7DQ1vL1lbg43hLalomptXMWXWnQytHT3gbW1q\nW8ctl9YycuSYAW9rMKr48P7ud29lyZJH6OjooLq6mjPPnEzE/uUuS6pYQ2tHM2zkXuUuo+JVdHg/\n/fQKHnroAWbPvgWApUuf4vLLpzFv3vfLXJkk7ZyK/sCyrq6ONWvWsGjRQtaubeHNb34Lc+fexvLl\nSzn77DM4++wzuOiiL9He3sYvf/kLJk06ja6uLm6+eQ6zZ19f7vIlaZsqeuS9556NXHXVLObP/yHz\n5s1l+PDhnHbaRL7//e8wdeo0mpr2YdGihXzve7dx2mkTWbLkYS67bBotLX/hmmu+We7yJWmbKjq8\nn3mmmREjarnggksASOlPTJnyeTZv3sTMmdMB6OjoYOzYvQE48cQJfPzjx/LVr05nyJCK/qNEUuYq\nOryXLn2Kn/50AVddNYuamhrGjh1LXV09tbW1XHTRVxgzZi8ef/x3PP/8OgCuvvoKJk/+IjffPIcD\nDzyYurq6Mr8DSdq6ig7vI444klWrnubUUycwYsQIurq2cNZZkxkzZi8uvfQSOjs7GTJkCOeffzF3\n3PEDGhr25PjjP86wYcOYPv1SLrvsqnK/BUnaqn59k87OamnZMPCNlFGlrylc6ey/4lm5cgUXzPm/\nklwquHH9Gm48/6iKvs67sbF+m1/G4MSuJGXI8JakDBnekpQhw1uSMjQorjYZiJXIxo4dR3W131gv\nqTINivAu9kpkm9rWMfPcY2hq2rco55OkwWZQhDeUZyWyzs5OzjnnTDo6Orj66muLdlPOcccdzcKF\ni4tyLknamkET3uXQ0tLCiy++yE033VbkM2/z0kxJKopdOrxnzryS5uZVXHHFV2hvb2fDhvUATJ78\nRd70pvF86lPHc8AB72L16lUceODBtLX9jSee+APjxjVx8cVfZfnyZdxww9epqRlCS8tapky5gHe8\n44BXzr9s2VKuvXYGACNH7s7UqZcwYkRtWd6rpMqyS19tMmXK+TQ17UtDw2gOPvgQrr12NuedN5UZ\nM64E4Nln/8zpp5/JDTd8i/nzf8gJJ/wbc+d+m8ce+x1tbX9jxYrlnHXWF5g3bx4nnjiBn/3sp686\n/9e+djlTppzPddfdyGGHHc53v/vtcrxNSRVolx55v2zZsqd49NFfce+9d9PV1fXKCHz33fegsbH7\n1tvhw4czbtw+ANTX17Fp0yYaGxu59dab+MlP6lm3rpXa2lfPma9cuWKrqxdK0s4aNOG9qW1d2c7V\n1LQvRx/9Vj7wgaNpbW1l0aKFAFS9aur678uzdHV10dXVxTXXzODLX76Mgw46gOnTZ7BmzXOv2nfc\nuH22unqhJO2sQRHeY8eOY+a5xxT9nP1RVVXFhAknc+WVX2Xhwh/T3t7OySef/vLW3nu+6piqqio+\n9KGPcNFFX2L06Ab22GM0L7zw11ftO2XK+a9bvVCSisFVBYvAVenyZv8Vj6sKFperCkpShTG8JSlD\nhrckZcjwlqQMDYqrTVxVUJJ2zKAI7+bmVVxw51cY1jCiKOfb+Hw7Vx4zzVUFJVWsQRHeAMMaRjB8\nTHFW9RtId921iJUrn+aMM84qdymSdmHOeRegqspVAyWV16AZeZfDXXct4qGHHuCll15i3bp1fOIT\nn+LBB+9nxYplTJo0mTVr1vDAA/exceNGdt99D6644upXHf+jH/2Qu+9ezNChNRxxxFGccMIny/RO\nJO1qthveEVED3ALsAwwFLgeeAG4FtgC/TylNGtgSB1Z7+4vMmnU999zzc26//b+YM2cev/71Em6/\n/fvsv//buPba2QCce+7n+dOfnnjluKefXsE999zN7Nk3s+eedZx00gQOOeSf2Hvv/t2WL0k7o6+R\n90nA2pTShIjYA/gd8FtgakrpwYiYHRHHpZQWDnilA2S//QKAurp6mpr2AaC+fiSbN3dQXV3DtGlT\nGT58OGvX/oWOjo5Xjlu+fBnPPfcskydPpKZmCOvXv0Bz8yrDW1JJ9BXetwN39DyuBjqAA1NKD/a8\ndhfwQWCnw3vj8+07e4qCzrWt+euOjs384hf3M2fOPF56aSOnnPJpeq8DM25cE29603hmzLiOxsZ6\nvvGNOYwf/5adrl2S+mO74Z1SageIiHq6Q/xCYEavXTYAu+9sEWPHjuPKY6bt7Gled86dUV1dw7Bh\nw5k48RQARo9uZO3alle2v/nNb+HAA/+RiRNPoaurk/32e+sra39L0kDrc1XBiNgb+DFwQ0rp2xGx\nKqU0rmfbvwIfSCmdvb1zdHR0dtXUeMOMVOmWLVvGGdPvKemqguPHjx/wtspom5e29fWB5V7AYmBS\nSum+npd/ExHvSyk9AHwYuLev1ltbizclMhi5pGje7L/iaW1tK3mbldx3jY3129zW15z3BcAewMUR\ncQndXxEzGbg+It4A/BGYX6Q6JUn91Nec9znAOVvZ9P4BqUaS1C/eYSlJGTK8JSlDhrckZcjwlqQM\nGd6SlCHDW5IyZHhLUoYMb0nKkOEtSRkyvCUpQ4a3JGXI8JakDBnekpQhw1uSMmR4S1KGDG9JypDh\nLUkZMrwlKUOGtyRlyPCWpAwZ3pKUIcNbkjJkeEtShgxvScqQ4S1JGTK8JSlDhrckZcjwlqQMGd6S\nlCHDW5IyVFPuAqTeOjs7aW5eVdI2GxreXtL2pGIwvDWoNDevYsqsOxlaO7ok7W1qW8ctl9YycuSY\nkrQnFYvhrUFnaO1oho3cq9xlSIOac96SlCHDW5IyZHhLUob6NecdEYcC01NKR0bEu4FFwJM9m2en\nlO4YqAIlSa/XZ3hHxHnAp4G/9bx0EDAzpfT1gSxMkrRt/Zk2WQoc3+v5QcAxEXF/RNwUEbUDU5ok\naVv6DO+U0gKgo9dLDwPnpZSOAJYDXx6Y0iRJ21LIdd4/SSm90PN4AXBdXweMGjWCmprqAprKR2Nj\nfblLqAjr15fnDzn7rzjK0X+7at8VEt6LI+KslNIS4Cjg0b4OaG1tL6CZfDQ21tPSsqHcZVSE1ta2\nsrRr/xVHOfqvkvtue/8xFRLeE4HrI2IT8BxweoF1SZIK1K/wTimtBA7vefwb4L0DWZQkafu8SUeS\nMmR4S1KGDG9JypDhLUkZMrwlKUOGtyRlyPCWpAwZ3pKUIcNbkjJkeEtShgxvScqQ4S1JGTK8JSlD\nhSwJO6h1dnbS3LyqpG02NLy9pO1JUsWFd3PzKqbMupOhtaNL0t6mtnXccmktI0eOKUl7kgQVGN4A\nQ2tHM2zkXuUuQ5IGjHP
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd12663a0f0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We pass 'Sex' from columns to rows with unstack, so that now Pclass is in the columns\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar')"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126629908>"
]
},
8 years ago
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAENCAYAAADAAORFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFnxJREFUeJzt3X+QXGWd7/H3ZAZMZjLABCaUt0IG5ceXtRbZTbigFmVA\n2CvIpZBF71pcRAVhiWEJErhC5LcSoiFBAhrZgGFFuQoRjBuXy1rChYhbYBBFF3z4ESAM8mNCmkyY\nISQzmfvHDNwhJtPNpHt6ns77VZWq7j6nz/l2ns6nnzznnOfU9fX1IUnKy5hqFyBJevcMb0nKkOEt\nSRkyvCUpQ4a3JGXI8JakDDWUslJETARWAkcBvcDNwGbgjymlGRWrTpK0VUV73hHRAHwX6B54aQEw\nO6U0DRgTEcdXsD5J0laUMmxyNbAI+DNQB0xJKa0YWHYX/b1xSdIIGjK8I+LzwCsppV/QH9xbvmc9\nsGtlSpMkbUuxMe8vAJsj4u+Ag4DvA62DljcDrxXbSU9Pb19DQ/2wi5SkHVTdthYMGd4D49oARMQ9\nwJnAvIj4aErpfuAY4J5iey8UuoutkrXW1mY6OtZXuwwNk+2Xr1pvu9bW5m0uK+lsky2cByyOiJ2A\nx4Glw6xLkjRMJYd3Suljg54eXv5SJEml8iIdScqQ4S1JGTK8JSlDhrckZcjwlqQMGd6SlKHhnOed\nlR/84GZWrnyInp4e6uvr+dKXZhJxQLXLkqTtUtPh/eyzz/DAA/ezaNH3AHjqqSe58spLWbLk1ipX\nJknbp6aHTcaPH8/LL7/M8uXLWLOmg3333Y/Fi7/PqlVPcfbZZ3L22Wdy0UVfobu7i1//+lfMmHE6\nfX193HTTDSxadF21y5ekbarr6+ur+E46OtZXfifb8OSTiaVLf8zKlQ8xbtw4Tj99OrfeeguzZ19K\nW9veLF++jBdf/DOnnz6dhQvns27dOjo6XuFb3/oOY8aU9ttW6/Mr1DrbL1+13natrc3Dm5gqdy+8\n0E5jYxMXXngJACn9iVmz/olNmzYyf/5cAHp6epg0aS8ATjrpFD71qeO44oq5JQe3JFVDTYf3U089\nyc9+diff+MYCGhoamDRpEuPHN9PU1MRFF13OxIl78oc//J61a18FYN68OcyceR433XQDU6YczPjx\n46v8CSRp62o6vKdNO4LVq5/li188hcbGRvr6NnPWWTOZOHFPvva1S+jt7WXMmDFccMHF3H77j5gw\nYQ9OOOFTjB07lrlzv8bXv/6Nan8ESdqqmh/zHgm1Pu5W62y/fNV62w015u3AriRlyPCWpAwZ3pKU\noZo+YClpZPX29tLevnrE9tfZ2USh0DVi+5s0aTL19aPjZuqjIrwr0eCj6S9Z2lG0t6/mwp9fztgJ\njdUupew2rO3mqmMvpa3tfdUuBRgl4d3evppZC37Ozk27l2V7G7teZf65x46av2RpRzJ2QiPjJnqN\nRKUVDe+IGAMsBgLYDJwJ7AwsB54YWG1RSun27Slk56bdGbvLntuziXett7eXc875Ej09Pcybd23Z\nLso5/viPs2zZ3WXZliRtTSk97+OAvpTSYRExDZgD/CswP6V0TUWrq7COjg7eeOMNbrzx+2Xe8jZP\nzZSksiga3imlZRHxrwNP9wYKwFQgIuKTwJPAzJTSyB01KJP586+ivX01c+ZcTnd3N+vXdwIwc+Z5\nvP/9+/CZz5zAgQcexPPPr2bKlIPp6nqdxx77TyZPbuPii69g1aqnuf76a2hoGENHxxpmzbqQv/7r\nA9/e/tNPP8W1114NwC677Mrs2ZfQ2NhUlc8qqbaUdKpgSmlzRNwMXAv8EHgQOC+lNA1YBVxWqQIr\nadasC2hrex8TJuzOwQcfwrXXLuL882dz9dVXAfDii3/mjDO+xPXX/zNLl/6YE0/8Hyxe/C88+ujv\n6ep6nWeeWcVZZ32ZJUuWcNJJp/Bv//azd2z/m9+8klmzLmDhwu/yoQ99hB/84F+q8TEl1aCSD1im\nlD4fEROBh4APp5ReHFh0J7BwqPe2tDTS0LDtMz86O8vfG21paaK1tXnIdTZu7GSnnep5/vlnePTR\n37JixT309fXxxhtdtLY209LSwgc+sA8ATU2NTJ164MC2d6W5eWf226+NW265mZ/+dByvv/4648eP\np7W1mTFj6mhtbWb16mdZuHAe0D97YVtbW9GaVB22S3lU4t/yaFJKroyUUg5YngxMSinNBTbQf9Dy\njog4O6X0G+BI4OGhtlEodA+5j0Khi41dr5ZcdDEbu16lUOgqOufB2rVdbNrUy3vfuxdHHPHfOOqo\nj1MoFFi+fBkdHevp6+t7exubN29++/GmTT2sWbOeyy67gssu+zpTpx7I3LlX8/LLL9HRsf7tdffa\nq42vfOWSd8xeWMvzMOSq1ufHGEkjec51NZSSK+U01A9FKT3vO4AlEXHfwPozgeeB6yNiI/AScMb2\nFDhp0mTmn3vs9mxiq9ssRV1dHaeccipXXXUFy5bdQXd3N6ee+tbHGXzgse4d76mrq+Pooz/BRRd9\nhd13n8Buu+3OunWvvWPdWbMu+IvZCyWpHJxVsAzsueXN9iuf5557hsv/Y15Nnuf9xiuvc+mHzx/R\n60ecVVCSaozhLUkZMrwlKUOGtyRlaFRMTOWsgpL07oyK8C73NJKjbepGSSq3URHekM80knfdtZzn\nnnuWM888q9qlSNqBOeY9DHV1zhooqbpGTc+7Gu66azkPPHA/b775Jq+++iqf/vRnWLHiPp555mlm\nzJjJyy+/zP3338uGDRvYddfdmDNn3jve/5Of/Jhf/OJudt65gWnTjuTEE/+hSp9E0o5mhw5vgO7u\nN1iw4Dp++ct/57bb/jc33LCE3/52JbfddisHHPABrr12EQDnnvtP/OlPj739vmeffYZf/vIXLFp0\nE3vsMZ6TTz6FQw75MHvtVdpl+ZK0PXb48N5//wBg/Phm2tr2BqC5eRc2beqhvr6BSy+dzbhx41iz\n5hV6enreft+qVU/z0ksvMnPmdBoaxtDZuY729tWGt6QRMWrCe8PaoWcerNS2tjV+3dOziV/96j5u\nuGEJb765gdNO+yyD54GZPLmN979/H66+eiGtrc18+9s3sM8++2137ZJUilER3pMmTeaqYy8t+za3\nR319A2PHjmP69NMA2H33Vtas6Xh7+b777seUKf+V6dNPo6+vl/33/ytaWydu1z4lqVTOKlgGzkqX\nN9uvfJxVsLycVVCSaozhLUkZMrwlKUOGtyRlyPCWpAwZ3pKUoaLneUfEGGAxEMBm4EzgTeDmged/\nTCnNqGCNkqQtlNLzPg7oSykdBlwMzAEWALNTStOAMRFxfAVrlCRtoWh4p5SWAWcMPG0DCsCUlNKK\ngdfuAo6qTHmSpK0pacw7pbQ5Im4GFgK3AoOv+lkP7Fr+0iRJ21Ly3CYppc9HxETgN8C4QYuagdeG\nem9LSyMNDbV9P8nW1uZql6DtYPuVR2dnU7VLqKiWlqZR810p5YDlycCklNJcYAPQC6yMiGkppfuA\nY4B7htpGoVC+GQNHI+fGyJvtVz6FQle1S6ioQqFrRL8rQ/1QlNLzvgNYEhH3Dax/NvAn4MaI2Al4\nHFhahjolSSUqGt4ppW5ga/f3Orzs1UiSSuJFOpKUIcNbkjJkeEtShgxvScqQ4S1JGTK8JSlDhrck\nZcjwlqQMGd6SlCHDW5IyZHhLUoYMb0nKkOEtSRkyvCUpQ4a3JGXI8JakDBnekpShkm9ALI2E3t5e\n2ttXj+g+OzubRvTei5MmTaa+vrZvyK3KM7w1qrS3r+bCn1/O2AmN1S6lIjas7eaqYy+lre191S5F\nmTO8NeqMndDIuInjq12GNKoNGd4R0QB8D9gb2Bm4EngeWA48MbDaopTS7RWsUZK0hWI975OBNSml\nUyKiBfgdcDkwP6V0TcWrkyRtVbHwvg14q1c9BtgETAUOiIhPAk8CM1NKI3e0R5I09KmCKaXulFJX\nRDTTH+IXAQ8B56WUpgG
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd129692d68>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Now we make that the plot shows both values combined, and change the labels\n",
"df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar', \\\n",
" \n",
" stacked=True) "
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7fd126537be0>"
]
},
8 years ago
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATcAAAJoCAYAAAADEqaNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXHWZqPEn6Qayig3T4IIEBHldQRaX0QygLMqgI4zX\nYVQGQRRh2Ny4DigC47AoCC4gIigMjuggiIwwEBG4LKMiq4jgCxoVguIkUJilCSSdvn+cE2hieqt0\ndXX/8nw/Hz7p2k69VU0/fc6pqtOT+vr6kKTSTG73AJLUCsZNUpGMm6QiGTdJRTJukopk3CQVqbPd\nA+jZImIW8Bvg7vqsSfW/X8rM89szVXtExPnALzLz9HbPoonHuI1PPZm53coTEfEC4J6IuDUz72nj\nXNKEYdwmgMz8Q0Q8AGwVEXOBs4GXABsAi4D3ZOYDEfH3wCeB3vq/ozLz5kHOfw7wReCVwDrAtfVl\nKyLiCeAUYDfg+VRrjl+MiMnAacDbgceBnwEvz8w3DbG8pcDlwNbAezPzjpWPLyKmA18G3ggsA76f\nmZ/q/xxExPuBg+rlbgB8NjO/GhEbAxcCG9ZX/e/M/PRA56/63EbEMuALwJuAacAnM/Oyfvf5z1Rr\nz48Ch2Xm/fUa5QbAi4ErMvPoVR7L+cCWwArg9sz8UH3Z24BP1Y+hB/h4Zt4SEd8ApmfmPhHxCuA6\nYKfM/NWq82r43Oc2AUTEXwNbALcAewCNzHxDZr4UuA04rL7q54BDMvO1wLHAzkOcfwZwW2a+BtgO\n6AY+Wl+2HvC/mTkbeBdwSkSsC3wQ2BZ4ObByrr5hLG9d4PLMfFn/sNX+FVgvM6Ne9hsjYsd+j386\ncCCwR2ZuD/xj/Zio5/lNZu4A7AhsGREzBzl/VR3Agvp6+wDfiIgN6/t/HzC7vs9Tgcv63W5qZr6q\nf9hqewMz6jXv19bzvzgitgRO6vcYPgRcFhFTqb5/W0fEfsB3gCMM25pzzW18mhYRd1CtMXQC86nW\nzh4GLo2IuRFxGNXawc7Aj+vbfRv4fkRcCVzDMwEY6Py3Aa+JiA/Up6dQrW2s9F8AmXlHHbbpVHG9\nMDOXAUTEOcDhw1zezQM83l2Bj9T3tYxqLYqIOKA+b0lEvB14W0S8BHh1PQvA1cCV9b7KHwH/kpmL\nImK15w9w/2fW9/OLiLibKoavpwr3jyNi5X7P50bEc4d4LDcDJ0bE9VTP9Rcyc25EHAI8D7i23/KW\nA1vW9/tuql9eF2bmfw6wbI2AcRufnrXPrb/6h+SDVJtx3wIeAzYDyMxjI+LrwO7A/sC/ANut7vyI\n2J5qreVdmZn1stfn2TF6YpW7n0T1Azmp33m9/b4eanmLB3i8y3lm7Y+I2IRqs23l6RcCPwHOAW4C\nLgH2rB/zbRGxOVUg3wzcGhHvyMyfDnT+au5/1cfQW//7zVU2OV+QmY9HxICPJTN/V6+l7Vzf77UR\ncXi9vGsz892rPM6H65MvBRYA20ZEZ2YuH+C50jC5WTo+TRrkst2B8+tXTh+g2vfVEREdEfFbqk2i\nr1HtK3ppRKyzuvOpfrHNod5sjIj1qNbUDvuLe3z2TFcC+0bEuhHRSRXLlWEayfL6+xHwvoiYVN/u\nEqq1p5V2oNpEPjEzr6kfM/X1TwY+nZn/lZkfBn5JtW9ytecPcP/71cvbDgjgBuCHwLsj4nn1Zf9M\ntQ9xUBFxMHBBZl5Th3EOsHI/2u5RlzEi/hb4OTAlIjaj2u+3G/Arnlmz1howbuPTYIdqOQ04uN5s\nvQa4nWrTphc4ErgoIm4HLgYOqDfzBjr/CKpN4F8Ad1H9sK38wVp1hpWnL6B6EeEOqk2wJ3lmLWsk\ny+vvBKoXEn5eP54rMvP7/S6fAzwcEVk/hk2oNtW3pIrCqyPi7oi4FZhLtRk+0Pmr88Z6uecB/5CZ\nf87MHwKfBa6JiLuo9vPtPYzHciEwOSLure93JvDFzLyX6gWR70TEnfVjfjvwFHAR1Qsk91L9Mvg/\nEbHHIPehYZjkIY80EhGxG7BRZn6rPv0F4InV7FifECJiBbBhZjbaPYtG17D2uUXERlSvyu1KtT/i\nAqp9Kfdk5qEtm07j0S+BoyLiKKr/f+4CDmnvSGukj8F3A2iCGnLNrd6vcjHVS/9/R/WS+GmZeVNE\nnA1cnZmXt3xSSRqB4exzO43qTaN/oPoNt11m3lRfdhXV2pwkjSuDxi0i9qd6leoanll173+bRcD6\nrRlNkpo31D63A4AV9U7kbaheCerud/lMqo/gDGr58t6+zs6OpoeUpAEMuL900Lhl5k4rv46I64CD\ngVMjYsfMvJHq3erXDXXvjUbPUFcZt7q7ZzJ//kBvbFcr+JyPvYn6nHd3r+4TdZVmPqHwceDciFgH\nuI/qDZeSNK4MO26Z+eZ+J3ce/VEkafT4CQVJRTJukopk3CQVybhJKpLHc5PGud7eXubNe3BUl7nJ\nJpvS0VH2e0+NmzTOzZv3IB87/UrWnb7h0FcehqeWPMrnP7ons2ZtPuj17rzzdo444mCOP/4kdtll\nt6fPf9/7/pGIl3HMMcf9xW2uuuoKfv/733HwwcM5jF9rGTdpAlh3+oZMec7GY36/s2ZtxrXX/vDp\nuM2d+2uWLl066G0mTRofB1kxbpIGtMUWL+Ghhx6kp2cJ06ZNZ86cq9h99z34058e4dJLL+bGG69n\n6dKlrL/+cznppFOfddtLL/1PrrlmDpMmTWLXXXfnne/cZ0xn9wUFSYPaeec3c8MN1wNw332/5FWv\n2oYVK1awaNFCvvjFsznnnPNZvnw5v/rVvU/f5ne/+y3XXnsNZ5/9dc4661xuvPH/8dBDo7vfcCiu\nuUka0KRJk9htt7dy6qkn8/znv4BtttmWvr4+Jk+eTEdHJ8cddwxTp05lwYL/ZfnyZ/6mzdy5v+GR\nR/7IkUceQl9fH4sXL2LevAd50Ys2HbPZjZukQT3/+S9g6dInuOSS/+Tggw/j4YfnsWTJYm6++QbO\nOed8nnxyKQce+E/0P/DtppvO4sUv3oLTTvsSABdffBFbbPGSMZ3buEkTwFNLHm3rsnbZZTfmzLmK\nTTZ5EQ8/PI/Ozk6mTJnKIYccCMCGG3azYMH8p6+/5ZYvYbvtXsMhhxzIsmXLePnLX0F390aj9hiG\nY0z+QMz8+Ysm7F+hmaiHgpnIfM6fbSze5zZRn/Pu7pnNHc9NUvt1dHQM+Z40/SVfLZVUJOMmqUjG\nTVKRjJukIvmCgjTOeVSQ5hg3aZybN+9Bjr7yBKZsMG1Ulrf0sR5O3vO4IV+BfeSRPz59BJC+vj4m\nTZrEdtvtwP77f2BU5gA4/PAPcdRRx7DpprNGbZkrGTdpApiywTSmbjRjzO9388234Etf+uqY3+9o\nMG6SBrS6N/mfc85Z3H33XaxY0cs++7yXnXfehcMP/xBbbrkVc+f+hmnTprL11tvys5/9hMWLF3PG\nGWcxefIkTjnl31i8eDGPPjqfvfd+F3vt9c6nl7lkyWJOPvkzLFq0EIAjj/wYL37xlms0u3GTNKDf\n/W4uRxxx8NObpW9721784Q8Pc9ZZ5/LUU0/xoQ/tzw47vA6AV7zilRx55Mf42MeOYOrUKZxxxlmc\neOLx3HXX7Wy00cbsuutb2HHHnVmwYAGHH37Qs+J24YXns8MOr2Wvvd7JvHkPcdJJJ/CVr5y3RrMb\nN0kDWnWz9KKLLiTzV08Hr7e3lz/+8Q8AbLVVADBjxgw22+zFAMycOZMnn3yKDTbYkIsv/jY33HAd\n06ZNZ/ny3mfdz9y5v+aOO27juuuuoa+v7+k1uDVh3CQNaNXN0k033Yztt9+Bo446hr6+Pv7937/O\nC1+4SX3pwEfg/fa3/4NXvnJr9trrndxxx2389Kf/86zLZ83anLe85WXsuutbaDQaXHHF5Ws8u3GT\nJoClj/W0ZVmrHjJ89uwdufPO2zn00A/yxBNPsOOOOzNt2rRnXW91X8+evSNnnPE5rr32h8yYMYOO\njk6WLVv29OX77XcAJ5/
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1265e2a20>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Small touches\n",
"\n",
"pclass_labels = ['First', 'Second', 'Third']\n",
"sex_labels = {'Female': 0, 'Male': 1}\n",
"\n",
"plt = df.query('Age < 20 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='bar', \n",
" stacked=True, rot=0, subplots=False, figsize=(5,10))\n",
"plt.set_xticklabels(pclass_labels)\n",
"plt.legend(labels=sex_labels)\n",
"plt.set_xlabel('Passenger class')\n",
"plt.set_title('Passenger class per sex')"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.text.Text at 0x7fd1264403c8>"
]
},
8 years ago
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAEKCAYAAAArYJMgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHfFJREFUeJzt3Xl8VPW9//FXCCIkoAI3rggCyqcutVcFV0TErV6ty8+q\n1bqCC1TBW5WfigsuVax63a2lLrhUrVTrBldR0YK2dcENLfUjhqJStQIGgYQtYe4f5wQnSMIJ5ptz\nMnk/Hw8emTkzc+Z9JmHec77nzDlFuVwOERGRUNqkHUBERAqbikZERIJS0YiISFAqGhERCUpFIyIi\nQaloREQkqLZpBxBZnZn1AMqB6fGkovjnre4+Lp1U6TCzccD77n5j2llE1pWKRrKqyt13rr1iZpsD\nH5jZm+7+QYq5RKSRVDTSIrj752Y2E+hjZrOAO4FtgC7AIuB4d59pZv8PuBioif+NdPdXG5i+AXAL\nsAOwHjA5vm2lmS0BrgUOADYjWqO6xczaADcAPwEWAG8A27n7vmuZ31LgKWBH4Ofu/nbt8plZKXAb\nsBewAnjS3S/Jfw3MbDBwRjzfLsCv3f23ZrYJ8ADQNb7r/7r7ZfVNX/21NbMVwM3AvkAJcLG7P5H3\nnL8gWqucD5zt7h/Fa1pdgF7ABHe/aLVlGQdsDawE3nL3M+PbDgUuiZehCjjf3V83s3uBUnc/1sy2\nB14C9nH3D1fPKy2PttFIi2BmewC9gdeBg4EKd9/T3X8ATAPOju96HTDM3XcFLgUGrmX6TcA0d+8H\n7AyUAefGt60PfOXu/YGjgWvNrB1wOrATsB1QmyuXYH7tgKfcfdv8koldCazv7hbPey8zG5C3/KXA\nEOBgd98F+Fm8TMR5yt29LzAA2NrMOjUwfXXFwLz4fscC95pZ1/j5Twb6x895PfBE3uM6uPsP80sm\ndiTQMV4j3TXO38vMtgauyVuGM4EnzKwD0e9vRzM7CfgDMEIlUzi0RiNZVWJmbxN9km4LzCVaa/kX\n8LiZzTKzs4k+NQ8E/ho/7hHgSTObCLzAt2/G9U0/FOhnZqfF19sTfQqv9TSAu78dl0wpUdE94O4r\nAMxsLDA84fxerWd59wd+GT/XCqK1C8zs1HhapZn9BDjUzLYB/jPOAvAcMDHetvUicKG7LzKzNU6v\n5/lvj5/nfTObTlRMuxOV6F/NrHY72UZmttFaluVV4Goze5notb7Z3WeZ2TBgU2By3vyqga3j5z2O\n6IPEA+7+aD3zlhZIRSNZVWcbTb74Det0oqGmh4Cvga0A3P1SM7sHOBA4BbgQ2HlN081sF6JP80e7\nu8fz3pC6xbBktacvInpzLMqbVpN3eW3zW1zP8lbz7VoRZtaNaGip9voWwN+AscArwGPAIfEyTzOz\nnkRlNQh408wOd/fX6pu+hudffRlq4p8PrjYstrm7LzCzepfF3WfHay8D4+edbGbD4/lNdvfjVlvO\nf8VXfwDMA3Yys7buXl3PayUtjIbOJKuKGrjtQGBcvAfaTKJtJcVmVmxm/yQatvkd0baFH5jZemua\nTvRBaxLx0JaZrU+0BnP2d56xbqaJwAlm1s7M2hIVV21JNGZ++V4ETjazovhxjxGtVdTqSzSMd7W7\nvxAvM/H9xwCXufvT7v7fwN+JtmWtcXo9z39SPL+dAQOmAM8Dx5nZpvFtvyDa5tQgMxsK3OfuL8Ql\nNQmo3e5yoMUtZWb/BbwHtDezrYi2Ex0AfMi3a5xSAFQ0klUNHVb8BmBoPLT2AvAW0fBLDXAO8LCZ\nvQWMB06Nh6Lqmz6CaJjufeBdoje+2je51TPUXr+PaAeAt4mGiZbx7dpHY+aX7wqinQDei5dngrs/\nmXf7JOBfZubxMnQjGk7cmugN+j/NbLqZvQnMIhoqrG/6muwVz/du4Bh3/8bdnwd+DbxgZu8SbRc6\nMsGyPAC0MbMZ8fN2Am5x9xlEOzP8wczeiZf5J8By4GGinRtmEBXzT83s4AaeQ1qQIp0mQKRxzOwA\nYGN3fyi+fjOwZA0bxVsEM1sJdHX3irSzSGHSNhqRxvs7MNLMRhL9H3oXGJZupO8lR8NDlSLfi9Zo\nREQkKG2jERGRoFQ0IiISVKveRlNdXZOrqKha+x2bUefOJWQtE2QzlzIlo0zJZTFXFjOVlXVq1Da9\nVr1G07ZtcdoRviOLmSCbuZQpGWVKLou5spipsVp10YiISHgqGhERCUpFIyIiQaloREQkKBWNiIgE\npaIREZGgWvX3aERE6lNTU8OcOZ826Ty7detOcXHL3125sVQ0IiJrMGfOp5x340TalXZtkvktr5zP\n/5x7CD169Kz3Pu+88xYjRgzl8suvYb/9Dlg1/eSTf4bZtowaNfo7j3n22Ql88slshg5NctqjdKho\nRETq0a60K+032KRZn7NHj62YPPn5VUXz0UcfsXTp0gYfU1SU7YNvq2hERDKkd+9t+OyzT6mqqqSk\npJSnn36aAw88mH//+0sef3w8U6e+zNKlS9lww4245prr6zz28ccf5YUXJlFUVMT++x/IUUcdm9JS\n1KWdAUREMmbgwEFMmfIyANOnT+eHP/wRK1euZNGihdxyy52MHTuO6upqPvxwxqrHzJ79TyZPfoE7\n77yHO+64i6lT/8xnnzXtNqZ1pTUaEZEMKSoq4oADfsz1149hs802p1+/fuRyOdq0aUNxcVtGjx5F\nhw4dmDfvK6qrq1c9btascr788gvOOWcYuVyOxYsXMWfOp2y5ZfcUlyaiohERyZjNNtucpUuX8Nhj\njzJq1AW8/75TWbmYV1+dwtix41i2bClDhpxI/okru3fvQa9evbnhhlsBGD/+YXr33iatRahDRSMi\nUo/llfNTm9d++x3ApEnP0qNHD95/32nbti3t23dg2LAhAHTtWsa8eXNX3X/rrbdh5537MWzYEFas\nWMF2221PWdnGTZb/+2jtp3LOzZ27KO0MdZSVdSJrmSCbuZQpGWVKLj9XVr5Hk8XXqrHno2nVazTl\n5eVUVFSmHaOOhQtL1ylTTU0NUERxcZj9O9Y1V0gNZWqtX4yTplNcXNzgd14kuVZdNIMvHd9kX8ZK\n2+K5H9Nxp3LadylJO0rqln5dxZhDRutNQiQjWnXRpPFlrFCWLZ5P+y4ldNi4Y9pRRETq0PdoREQk\nKBWNiIgE1aqHzkRE6pOVvc4KgYpGRGQN5sz5lIsmXtFkO9gk2Unlyy+/WHWk5lwuR1FREXvvvRdH\nH31ik2QAGD78TEaOHEX37j2abJ5ro6IREalHGjvY9OzZm1tv/e2q61n8Hk1jqWhERDJkTV+iHzv2\nDqZPf5eVK2s49tifM3DgfgwffiZbb92HWbPKKSnpwI477sQbb/yNxYsXc9NNd9CmTRHXXvsrFi9e\nzPz5cznyyKM54oijVs2zsnIxY8ZcxaJFCwE455zz6NVr6yDLpKIREcmQ2bNnMWLE0FVDZ8cf/zM+\n//xf3HHHXSxfvpwzzzyFvn13A2D77XfgnHPO47zzRtChQ3tuuukOrr76ct599y023ngT9t//IAYM\nGMi8efMYPvyMOkXzwAPj6Nt3V4444ijmzPmMa665gt/85u4gy6SiERHJkNWHzp566lHcP1xVPjU1\nNXzxxecA9OljAHTs2JGttuoFQKdOnVi2bDldunRl/PhHmDLlJUpKSqmurqnzPLNmfczbb0/jpZde\nIJfLrVqzCUFFIyKSIasPnfXq1YtddunLyJGjyOVy3H//PWyxRbf41voPOfbII79nhx125IgjjuLt\nt6fx2mt/qXN7jx49Oeigbdl//4OoqKhgwoSnmnpRVlHRiIjUY+nXVc0+r9VPyzxo0CD+/OdXOeus\n01myZAkDBgykpKSkzv3WdLl//wHcdNN1TJ78PB07dqS4uC0rVqxYdftJJ53KmDFX8dRTf6KqqorB\ng8/4votY/zK15qM3H3D673KFcgiabz6fwYZ7ztAhaIAlXy1m9B4jUznWWRb3EFKm5HT05mR09GYR\nkSagozc3HR2CRkREgsr
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd12648d470>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#The same horizontal\n",
"pclass_labels = ['First', 'Second', 'Third']\n",
"sex_labels = {'Female': 0, 'Male': 1}\n",
"\n",
"plt = df.query('Age > 25 and Survived == 1').groupby(['Sex','Pclass']).size().unstack(['Sex']).plot(kind='barh', \n",
" stacked=True, rot=0, subplots=False)\n",
"plt.set_yticklabels(pclass_labels)\n",
"plt.legend(labels=sex_labels)\n",
"\n",
"plt.set_ylabel('Passenger class')\n",
"plt.set_title('Passenger class per sex')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Sex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to explore the Sex attribute"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 314\n",
"male 577\n",
"dtype: int64"
]
},
8 years ago
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passengers by sex\n",
"df.groupby('Sex').size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see men are more numerous than women."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd1265aaf60>"
]
},
8 years ago
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAERCAYAAAB7FtAjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEehJREFUeJzt3XuQnXV9x/H3ZjeBXDZxGdZUisbL0G8sNXRA0SBDAA2S\nigYZW8aOpYIkXmKo09EqoTjacYVRBmtk0ClEQ62toxnFViZqNQxknepE6ozdNn5NL5aJdGqAk+xm\nIyWX7R/PL3BYNrsnl2fPsnm/ZjI5z++5fXfm2f2c3+93nud0jIyMIEnSjHYXIEmaGgwESRJgIEiS\nCgNBkgQYCJKkwkCQJAHQVfcJIuLDwJuBmcCdwIPARuAQMJCZa8p2q4DVwH6gLzPvq7s2SdLTau0h\nRMQyYGlmXgBcDLwIuB1Yl5nLgBkRsTIiFgJrgaXA5cAtETGzztokSc9U95DRG4CBiLgX+HvgW8C5\nmbm1rN8MLAfOB/oz80BmDgI7gCU11yZJalL3kNHpVL2CK4CXUoVCcwgNAfOBbmBPU/teYEHNtUmS\nmtQdCI8B2zPzAPDziHgCOLNpfTewGxikCobR7ZKkSVJ3IPQDNwCfjogzgLnA9yNiWWY+AKwAtgDb\ngL6ImAXMBhYDA+Md+MCBgyNdXZ21Fi9J01DHEVfU/XC7iLgVuLQUcSPwC+Buqk8dbQdWZeZIRLwT\neFfZri8z7x3vuLt2DflUPkk6Sr293e0LhLoYCJJ09MYLBG9MkyQBBoIkqTAQJEmAgSBJKmp/ltFU\ndfDgQXbufLjdZWgKOvPMF9HZ6UeadfI5aQNh586H+fO/3MSp805rdymaQp7Y+zgff/9bWbToJe0u\nRZp0J20gAJw67zTmzO9tdxmSNCU4hyBJAgwESVJhIEiSAANBklQYCJIkwECQJBUGgiQJMBAkSYWB\nIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFQaCJAkwECRJhYEgSQIMBElSYSBIkgADQZJUGAiSJMBA\nkCQVXXWfICIeAvaUxf8CPgFsBA4BA5m5pmy3ClgN7Af6MvO+umuTJD2t1kCIiFMAMvPSprZvAusy\nc2tEfC4iVgI/BNYC5wJzgP6I+G5m7q+zPknS0+ruIZwDzI2I7wCdwE3AuZm5tazfDFxG1Vvoz8wD\nwGBE7ACWAA/VXJ8kqah7DmEf8KnMfAPwHuDLQEfT+iFgPtDN08NKAHuBBTXXJklqUncg/JwqBMjM\nHcBjwMKm9d3AbmCQKhhGt0uSJkndQ0bXAa8A1kTEGVR/9L8bEcsy8wFgBbAF2Ab0RcQsYDawGBgY\n78A9PXPo6uo85sIGB+ce876a3np65tLb293uMqRJV3cgbAC+GBFbqeYJ3kHVS7g7ImYC24FNmTkS\nEeuBfqohpXWZ+eR4B2409h1XYY3G8HHtr+mr0Rhm166hdpch1WK8Nzu1BkL5lNDbx1h18RjbbqAK\nEElSG3hjmiQJMBAkSYWBIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFQaCJAkwECRJhYEgSQIMBElS\nYSBIkgADQZJUGAiSJMBAkCQVBoIkCTAQJEmFgSBJAgwESVJhIEiSAANBklQYCJIkwECQJBUGgiQJ\nMBAkSYWBIEkCDARJUmEgSJIAA0GSVHTVfYKIeD7wY+D1wEFgI3AIGMjMNWWbVcBqYD/Ql5n31V2X\nJOmZau0hREQX8HlgX2m6HViXmcuAGRGxMiIWAmuBpcDlwC0RMbPOuiRJz1b3kNFtwOeAR4AO4NzM\n3FrWbQaWA+cD/Zl5IDMHgR3AkprrkiSNUlsgRMQ7gF9l5j9ShcHo8w0B84FuYE9T+15gQV11SZLG\nVuccwrXAoYhYDpwD/DXQ27S+G9gNDFIFw+j2cfX0zKGrq/OYixscnHvM+2p66+mZS29vd7vLkCZd\nbYFQ5gkAiIgtwLuBT0XERZn5ILAC2AJsA/oiYhYwG1gMDEx0/EZj30SbTLD/8HHtr+mr0Rhm166h\ndpch1WK8Nzu1f8polA8Ad5VJ4+3ApswciYj1QD/V0NK6zHxykuuSpJPepARCZl7atHjxGOs3ABsm\noxZJ0ti8MU2SBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJAkCQBBoIk\nqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCBIEkqWgqEiPjsGG33nPhy\nJEnt0jXeyoi4G3gp8MqIOLtp1UxgQZ2FSZIm17iBAHwceDHwGeBjTe0HgO011SRJaoNxAyEzfwH8\nAjgnIuZT9Qo6yup5wON1FidJmjwT9RAAiIgbgRuBx5qaR6iGkyRJ00BLgQBcD7wsM3fVWYwkOHjw\nIDt3PtzuMjQFnXnmi+js7Kzt+K0GwsM4PCRNip07H+ajX+9jds/cdpeiKeTXjWE+etVNLFr0ktrO\n0Wog7AD6I+J+4InDjZn5F+PtFBEzgLuAAA4B7wb+D9hYlgcyc03ZdhWwGtgP9GXmfUf1k0jTyOye\nucw9vbvdZegk0+qNab8Evk31x7yj6d9E3gSMZOaFwM3AJ4DbgXWZuQyYERErI2IhsBZYClwO3BIR\nM4/qJ5EkHZeWegiZ+bGJtxpzv29GxD+UxUVAA3h9Zm4tbZuBy6h6C/2ZeQAYjIgdwBLgoWM5ryTp\n6LX6KaNDVJ8qavZIZr5won0z81BEbASuBH4fWN60egiYD3QDe5ra9+KNb5I0qVrtITw1tFSGcq6k\nGt5pSWa+IyKeD2wDZjet6gZ2A4NUwTC6/Yh6eubQ1XXss+2Dg07YaWw9PXPp7W3f+L3Xpo6k7muz\n1Unlp2TmfuBrEXHTRNtGxNuBMzPzVqrJ6IPAjyNiWWY+AKwAtlAFRV9EzKIKjMXAwHjHbjT2HW3p\no/YfPq79NX01GsPs2jXU1vNLYzkR1+Z4gdLqkNE1TYsdwNnAky3s+nXgixHxQDnXDcDPgLtLT2M7\nsCkzRyJiPdBfjr8uM1s5viTpBGm1h3BJ0+sR4FHg6ol2ysx9R9ju4jG23QBsaLEeSdIJ1uocwrXl\nHX2UfQbKJ4IkSdNEq9+HcB7VzWn3AF8EHo6IV9dZmCRpcrU6ZLQeuDozfwQQEa8BPgucX1dhkqTJ\n1eqdyvMOhwFAZv4QOLWekiRJ7dBqIDweESsPL0TElTzzUdiSpOe4VoeMVgPfiogNVB8LHQEuqK0q\nSdKka7WHsALYR/U8okuAXYzx0VFJ0nNXq4GwGnhtZg5n5k+B86ieTipJmiZaDYSZPPPO5Cd59sPu\nJEnPYa3OIdwLbImIr5blq4Bv1lOSJKkdWuohZOaHqO5FCOClwPrMvLnOwiRJk6vlp51m5iZgU421\nSJLaqNU5BEnSNGcgSJIAA0GSVBgIkiTAQJAkFQaCJAkwECRJhYEgSQIMBElSYSBIkgADQZJUGAiS\nJMBAkCQVBoIkCTAQJEmFgSBJAgwESVLR8jemHa2I6AK+ALwYmAX0Af8GbAQOAQOZuaZsuwpYDewH\n+jLzvrrqkiSNrc4ewtuBRzPzIuBy4A7gdmBdZi4DZkTEyohYCKwFlpbtbomImTXWJUkaQ209BOCr\nwNfK607gAHBuZm4tbZuBy6h6C/2ZeQAYjIgdwBLgoRprkySNUlsgZOY+gIjopgqGm4DbmjYZAuYD\n3cCepva9wIK66pIkja3OHgIR8ULg68AdmfmViPhk0+puYDcwSBUMo9vH1dMzh66uzmOubXBw7jHv\nq+mtp2cuvb3dbTu/16aOpO5rs85J5YXAd4A1mXl/af5JRFyUmQ8CK4AtwDagLyJmAbOBxcDARMdv\nNPYdV32NxvBx7a/pq9EYZteuobaeXxrLibg2xwuUOnsINwLPA26OiI8AI8CfAJ8tk8bbgU2ZORIR\n64F+oINq0vnJGuuSJI2hzjmE9wPvH2PVxWNsuwHYUFctkqSJeWOaJAkwECRJhYEgSQIMBElSYSBI\nkgADQZJUGAiSJMBAkCQ
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1265740b8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot with seaborn\n",
"sns.countplot('Sex', data=df)"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd1264b7080>"
]
},
8 years ago
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAERCAYAAACEmDeEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEQZJREFUeJzt3X1sXXd5wPGvYyclcZ3gUpPRtQSGuieMEaYAZWFVUwop\n6VrIQNMqJsagNOUlpEMMBg0C7R/TCrpUy6qC1gbCOhhiFS8TUYBBUBujgQJDYt7CQ/ZaAtNws5tc\nx1aXN++Pc8s8N/a9xNe+9i/fzz/1Pef4+El1/PXxufced01MTCBJKsOSTg8gSWofoy5JBTHqklQQ\noy5JBTHqklQQoy5JBelpZaOIeB/wamApcB/wCLAHOAsMZ+a2xnZbgduAU8BgZu6dg5klSdNoeqYe\nERuBDZn5UuBa4JnATmBHZm4ElkTElohYDWwHNgCbgTsjYumcTS5JepJWLr+8EhiOiC8AfwN8CVif\nmQca6/cBm4CrgKHMPJ2ZdeAwsG4OZpYkTaOVyy+XUp2d3wT8ElXYJ/8wGAVWAn3A8UnLTwCr2jOm\nJKkVrUT9KHAoM08DP4yIx4HLJ63vA44Bdaq4T10uSZonrUR9CLgduCciLgN6ga9HxMbMfBi4AdgP\nHAQGI2IZsBxYCwzPtOPTp89M9PR0z2Z+SboQdU27opUbekXEXcB1jR3dAfw78ADVq2EOAVszcyIi\n3gy8pbHdYGZ+Yab9joyMejexNhoY6GNkZLTTY0hP4rHZXgMDfbOL+lwx6u3lN44WKo/N9pop6r75\nSJIKYtQlqSBGXZIKYtQlqSAt3ftF0sJz5swZjhx5tNNjtKRe76VWG+v0GE1dfvkz6e5e3C+zNurS\nInXkyKP84c69LOt9WqdHKcLJsaP8ybtuZM2aZ3d6lFkx6tIitqz3aTxl5epOj6EFxGvqklQQoy5J\nBTHqklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBTHq\nklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBTHqklQQoy5JBelpZaOI+C5wvPHw34APAXuAs8BwZm5r\nbLcVuA04BQxm5t52DyxJml7TqEfERQCZed2kZV8EdmTmgYj4aERsAb4FbAfWAyuAoYj4amaempvR\nJUlTtXKm/gKgNyK+AnQD7wfWZ+aBxvp9wPVUZ+1DmXkaqEfEYWAd8N32jy1JOpdWrqmPAx/JzFcC\nbwM+BXRNWj8KrAT6+L9LNAAngFVtmlOS1IJWov5DqpCTmYeBo8DqSev7gGNAnSruU5dLkuZJK5df\nbgGeD2yLiMuowv3ViNiYmQ8DNwD7gYPAYEQsA5YDa4HhmXbc37+Cnp7u2cyvKQYG+jo9guZJvd7b\n6RGK09/fu+i/h1qJ+m7gExFxgOq6+RupztYfiIilwCHgocyciIhdwBDV5ZkdmXlyph3XauOzmV1T\nDAz0MTIy2ukxNE9qtbFOj1CcWm1sUXwPzfSDp2nUG69eef05Vl17jm13U/0QkCR1gG8+kqSCGHVJ\nKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohR\nl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SC\nGHVJKohRl6SCGHVJKkhPKxtFxNOB7wCvAM4Ae4CzwHBmbmtssxW4DTgFDGbm3rkYWJI0vaZn6hHR\nA3wMGG8s2gnsyMyNwJKI2BIRq4HtwAZgM3BnRCydo5klSdNo5fLL3cBHgZ8AXcD6zDzQWLcP2ARc\nBQxl5unMrAOHgXVzMK8kaQYzRj0i3gj8NDP/liroUz9nFFgJ9AHHJy0/Aaxq35iSpFY0u6b+JuBs\nRGwCXgD8BTAwaX0fcAyoU8V96vIZ9fevoKen++caWDMbGOjr9AiaJ/V6b6dHKE5/f++i/x6aMeqN\n6+YARMR+4K3ARyLimsx8BLgB2A8cBAYjYhmwHFgLDDf74rXaeLNN9HMYGOhjZGS002NontRqY50e\noTi12tii+B6a6QdPS69+meLdwP2NJ0IPAQ9l5kRE7AKGqC7T7MjMk+czrCTp/LUc9cy8btLDa8+x\nfjewuw0zSZLOk28+kqSCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohR\nl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SCGHVJKohRl6SC\nGHVJKohRl6SCGHVJKohRl6SC9HR6gIXuzJkzHDnyaKfHaEm93kutNtbpMZq6/PJn0t3d3ekxpCIZ\n9SaOHHmUE3d9mStWXNrpUVrS3+kBmvjR+GMced9m1qx5dqdHkYrUNOoRsQS4HwjgLPBW4H+APY3H\nw5m5rbHtVuA24BQwmJl752bs+XXFikt5zsWrOz1GMWqdHkAqWCvX1F8FTGTm1cAHgA8BO4EdmbkR\nWBIRWyJiNbAd2ABsBu6MiKVzNLck6RyaRj0zv0h19g2whupEa31mHmgs2wdsAq4ChjLzdGbWgcPA\nuvaPLEmaTkuvfsnMsxGxB9gFfBromrR6FFgJ9AHHJy0/Aaxqz5iSpFa0/ERpZr4xIp4OHASWT1rV\nBxwD6lRxn7p8Wv39K+jpWdivgqjXezs9QnH6+3sZGOjr9BiLnsdm+5VwbLbyROnrgcsz8y7gceAM\n8J2I2JiZDwM3APupYj8YEcuoor8WGJ5p37Xa+CzHn3u12tiCf0XJYlOrjTEyMtrpMRa9xfDy1cVm\nsRybM/3gaeVM/XPAJyLi4cb2twM/AB5oPBF6CHgoMyciYhcwRHV5Zkdmnpzt8JKk1jWNemaOAzef\nY9W159h2N7B79mNJks6HtwmQpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIY\ndUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkq\niFGXpIIYdUkqiFGXpIIYdUkqiFGXpIIYdUkqiFGXpIL0zLQyInqAjwPPApYBg8A/AXuAs8BwZm5r\nbLsVuA04BQxm5t45m1qSdE7NztRfDzyWmdcAm4F7gZ3AjszcCCyJiC0RsRrYDmxobHdnRCydw7kl\nSecw45k68FngrxsfdwOngfWZeaCxbB9wPdVZ+1BmngbqEXEYWAd8t/0jS5KmM2PUM3McICL6qOL+\nfuDuSZuMAiuBPuD4pOUngFVtnVSS1FSzM3Ui4grgc8C9mfmZiPjwpNV9wDGgThX3qctn1N+/gp6e\n7p9v4nlWr/d2eoTi9Pf3MjDQ1+kxFj2PzfYr4dhs9kTpauArwLbM/EZj8fci4prMfAS4AdgPHAQG\nI2IZsBxYCww3++K12vhsZp8XtdoY/Z0eojC12hgjI6OdHmPRq9XGOj1CcRbLsTnTD55mZ+p3AE8F\nPhARHwQmgD8A/qzxROgh4KHMnIiIXcAQ0EX1ROrJdgwvSWpds2vq7wTeeY5V155j293A7vaMJUk6\nH775SJIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBG\nXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIKYtQlqSBGXZIK\nYtQlqSBGXZIKYtQlqSBGXZIK0tPKRhHxEuCuzHxZRDwH2AOcBYYzc1tjm63AbcApYDAz987NyJKk\n6TQ9U4+I9wD3Axc1Fu0EdmTmRmBJRGyJiNXAdmADsBm4MyKWztHMkqRptHL55Z+B10x6/MLMPND4\neB+wCbgKGMrM05lZBw4D69o6qSSpqaZRz8zPA6cnLeqa9PEosBLoA45PWn4CWNWOASVJrTufJ0rP\nTvq4DzgG1KniPnW5JGk
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd126481208>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Same graph with matplotlib and pandas\n",
"colors_sex = ['#ff69b4', 'b']\n",
"df.groupby('Sex').size().plot(kind='bar', rot=0, color=colors_sex)"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 233\n",
"male 109\n",
"Name: Survived, dtype: int64"
]
},
8 years ago
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passergers survived by sex\n",
"df.groupby('Sex')['Survived'].sum()"
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"female 0.742038\n",
"male 0.188908\n",
"Name: Survived, dtype: float64"
]
},
8 years ago
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How many passergers survived by sex\n",
"df.groupby('Sex')['Survived'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that 74% of female survived, while only 18% of male survived."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"<matplotlib.axes._subplots.AxesSubplot at 0x7fd126396ac8>"
]
},
8 years ago
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAERCAYAAACdPxtnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFzhJREFUeJzt3X2YnXV95/H3PEBIJhM6gVkrRaJeF35Rq1SJkGTTFqjg\nXhBqhLYaH+gGUxFWXUvX1ay1WFfdttao4GY1pnCx4EM1l9DWh6yygA+pzarrSnDrFzBXC1G3Bjxk\nwoQhM8nsH/cZPBmSmZtk7jkzc79f/8z53U/nm+s6OZ9zP/x+v47R0VEkSfXU2e4CJEntYwhIUo0Z\nApJUY4aAJNWYISBJNWYISFKNdVd58IjoADYCZwJDwLrM3Nmy/jXANcAIcGNmfqzKeiRJh6r6TGA1\nMC8zVwDrgQ3j1n8AOB9YCfxRRJxYcT2SpBZVh8BKYCtAZm4Hlo5b/32gD5jfbNtzTZKmUdUhsAjY\n09IeiYjW9/wB8F1gB/CFzByouB5JUouqQ2AA6G19v8w8CBARLwAuBpYAzwSeFhGXVVyPJKlFpTeG\ngW3AKmBLRCyj+MU/Zg+wD3g8M0cj4mcUl4aOaGTkwGh3d1dlxUrSHNVxxBVVDiDX8nTQC5uL1gJn\nAT2ZuTkirgSuAB4HfgT8QWaOHOl4u3fv9Z6BNMfdcMMmvvKVL3HhhRdxxRVvaHc5c0J/f297QmCq\nGQLS3DY09Bhr176a0dFROjo6ufHGT3LCCfMn31ETmigE7CwmacYYHh5m7Ifp6OhBhoeH21zR3GcI\nSFKNGQKSVGOGgCTVmCEgSTVmCEhSjRkCklRjhoAk1ZghIEk1ZghIUo0ZApJUY4aAJNVY1UNJSyrh\nwIED7Nr1QLvLaLvBwcFD2g8++AA9PT1tqmZmOPXU0+jqqm4IfUNAmgF27XqAd3/+fczvq/cX3sH9\nBw5pX/fNj9F5fH3nEHmsMci7L30nS5Y8q7L3MASkGWJ+Xw89J/dOvuEcduDxER5paS84aSFd8/ya\nqpL3BCSpxgwBSaoxQ0CSaswQkKQaMwQkqcYqve0eER3ARuBMYAhYl5k7m+ueBnwGGAU6gF8D3p6Z\nm6qsSZL0C1U/e7UamJeZKyLiHGBDcxmZ+S/AeQARsQx4L/CJiuuRJLWo+nLQSmArQGZuB5YeYbvr\ngTdm5mjF9UiSWlQdAouAPS3tkYg45D0j4hLgnsy8v+JaJEnjVB0CA0BrF8jOzDw4bpvXAt4HkKQ2\nqPqewDZgFbCled1/x2G2WZqZ3ypzsL6+BXR313ccEc1dAwP1HjNIR9bX10N/f3XDiVQdArcCF0TE\ntmZ7bUSsAXoyc3NEnMyhl4sm1Gjsq6JGqe0ajcHJN1ItNRqD7N6995iOMVGIVBoCzRu9V41bfG/L\n+oeAF1dZg6TZo6Ozo6Uxrq1K2FlM0ozReVwXC5+zGICFpy+m8zgv/1bNMVolzSh9Z59C39mntLuM\n2vBMQJJqzBCQpBozBCSpxgwBSaoxQ0CSaswQkKQaMwQkqcYMAUmqMUNAkmrMEJCkGjMEJKnGDAFJ\nqjFDQJJqzBCQpBozBCSpxgwBSaoxQ0CSaswQkKQaq3R6yYjoADYCZwJDwLrM3Nmy/iXAB5vN/we8\nNjP3V1mTJOkXqj4TWA3My8wVwHpgw7j1m4B/m5m/AWwFllRcjySpRdUhsJLiy53M3A4sHVsREc8B\nHgauiYi7gMWZeV/F9UiSWlQdAouAPS3tkYgYe8+TgeXAdcBLgZdGxLkV1yNJalHpPQFgAOhtaXdm\n5sHm64eB+zPzXoCI2EpxpnDXkQ7W17eA7u6uikqV2mdgoKfdJWiG6uvrob+/d/INj1LVIbANWAVs\niYhlwI6WdTuBhRHx7ObN4l8HNk90sEZjX2WFSu3UaAy2uwTNUI3GILt37z2mY0wUIlWHwK3ABRGx\nrdleGxFrgJ7M3BwRrwc+HREAf5+ZX664HklSi0pDIDNHgavGLb63Zf1dwDlV1iBJOjI7i0lSjRkC\nklRjhoAk1ZghIEk1ZghIUo0ZApJUY4aAJNWYISBJNWYISFKNGQKSVGOGgCTVmCEgSTVmCEhSjRkC\nklRjhoAk1Vip+QQiYiFwHnA6cBC4H7g9M4cqrE2SVLEJQyAiFgDXApcCdwP/DAwDK4APRcTngf+c\nmY9WXagkaepNdiZwC7AJWN8yQTwAEdFJMX/wLcDqasqTJFVpshC4rDlF5JM0Q+FvI+Lvpr4sSdJ0\nmCwE3tWcBP6wMvM9RwoJSdLMN1kIdDT/ng2cCnwOGAFeAfzTZAePiA5gI3AmMASsy8ydLevfCqwD\nftZcdGVm3vcU6pckHYMJQyAz/xQgIrYByzNzX7P9YeDOEsdfDczLzBURcQ6wgUPvH5wFvC4zv3c0\nxUuSjk3ZfgL9QOtln+OAxSX2WwlsBcjM7cDScevPAtZHxDci4h0la5EkTZGyIfAJ4DsR8YGI+CDw\nHeDDJfZbBOxpaY80nyoa82ngjRR9EFZGxEUl65EkTYFSncUy8wMRcQdwLsUZwe9l5vdL7DoA9La0\nO8c9avqRzBwAiIgvAi8CvnSkg/X1LaC7u6tMydKsMjDQ0+4SNEP19fXQ3987+YZHqVQINAXFJaD3\nA5cBZUJgG0Vfgi0RsQzY8cTBIhYB90TEGcBjwPnAX010sEZj31MoV5o9Go3BdpegGarRGGT37r3H\ndIyJQqTU5aCI+DPgIoqew13A2uZlocncCjzevLH8QeAPI2JNRKxrngGsB+4Cvgbck5lby9QjSZoa\nZc8EXga8GPjfmTkQERdQDCPxRxPt1OxDcNW4xfe2rP8k8Mny5UqSplLZG8Nj1/HHnhCa17JMkjRL\nlQ2BzwJ/DSxudvD6OvCpyqqSJE2Lsk8H/XlEvIxiFNHTgGsz8wuVViZJqlzZ+QRuoxgt9J2Zub/a\nkiRJ0+WpdBZbDfwoIjZHxLnVlSRJmi6lQiAzv5iZrwWeQzEMxAcj4p8rrUySVLnSncUi4nnAq4Df\nBR6k3LARkqQZrOw9gR0UQ0jfApyfmT+ttCpJ0rQoeybw6szcMflmkqTZZLKJ5jdl5huA6yLiSTOI\nZeb5lVUmSarcZGcCH2/+fXfFdUiS2mCymcW+23x5DXAz8Lf2E5CkuaNsP4FN2E9AkuYc+wlIUo3Z\nT0CSauyp9hO4GfsJSNKcUfZMYFNmXl9pJZKkaVf2xvCVlVYhSWqLsmcCD0bEHcB2iknhAcjM91RS\nlSRpWpQNgX9oed1RRSGSpOlXdmaxPz2ag0dEB7AROBMYAtZl5s7DbPdx4OHM/E9H8z6SpKNT9umg\ng/xikvkxP8nMZ0yy62pgXmauiIhzgA3NZa3HvhL4VeBr5UqWJE2VsmcCT9xAjojjKL7Il5fYdSVF\n5zIyc3tELG1dGRHLgZdQjFF0RsmaJUlTpOzTQU/IzOHM/BxQZgTRRcCelvZIRHQCRMQvA9cCb8L7\nDJLUFmUvB13e0uwAng+UGUhuAOhtaXdm5sHm698FTgK+BDwdmB8RP8zM/36kg/X1LaC7u6tMydKs\nMjDQ0+4SNEP19fXQ3987+YZHqezTQee1vB4FHgJeWWK/bcAqYEtELAOemJim2fnseoCI+H0gJgoA\ngEZjX8lypdml0RhsdwmaoRqNQXbv3ntMx5goRMreE1h7lO99K3BBRGxrttdGxBqgJzM3H+UxJUlT\nZLKZxRYA7wE+m5n/KyI2AH8AfA9Yk5k/nmj/zBwFrhq3+N7DbHfTU6pakjQlJrsx/GFgAfBPEXER\n8BrgRRSPen604tokSRWb7HLQ8sx8AUBEvJzijOB+4P6IeH/l1UmSKjXZmcCBltfnAre3tI+f8mok\nSdNqsjOBhyPibKAH+BWaIdCcXnJXtaVJkqo2WQj8IfAZ4GnA1Zk5GBF/DLwFuLjq4iRJ1ZowBDLz\nbuB54xZ/Brg+M/ccZhdJ0iwy4T2BiPgvEXFi67LMvH8sACJicUT8eZUFSpKqM9nloM8CfxMRPwG+\nTnEfYARYQjF20CnAWyu
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd1263523c8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Graphical representation\n",
"# You can add the parameter estimator to change the estimator. (e.g. estimator=np.median)\n",
"# For example, estimator=np.size is you get the same chart than with countplot\n",
"#sns.barplot(x='Sex', y='Survived', data=df, estimator=np.size)\n",
"sns.barplot(x='Sex', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see now if men and women follow the same age distribution."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
8 years ago
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fd125f6e588>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd125ccaf98>], dtype=object)"
]
},
8 years ago
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
8 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEFCAYAAADkP4z+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGPVJREFUeJzt3XuUZWV55/FvdTXVYPUF6CmYmJa21fiYyygBIpco3bBg\nKRrTcTJrnFky2hphjB0VCSy5DDrRoCyFjiFGyHCR6FIgEBEiC4naDjSORlAHZUIewAvaMJGmp6C6\nq4W+VM0f+5QciuquXefsU6dO7e9nLRZ1bu95zul3/2rX3u9+377x8XEkSfWxoNsFSJJml8EvSTVj\n8EtSzRj8klQzBr8k1YzBL0k1Y/BXLCIui4gfRsSHO/geH4yISzvVvjRXRcSnI+LMbtfR6xZ2u4B5\n6HTgBZn5aLcLkaSpGPwViog7gT7gtoh4N3AG8AJgP+C6zLwoIlYCGxv/HUvxb3A28F+BlwH3ZOZ/\narR3HrAWWAQMAmdl5s2T3vP5wCcnv0+nP6vUiohYDXwUeBT4TWAH8EHgPcBLgb8HzgI+AbwSWEKx\nTb0jM785qa1fbzzvYKAfuDQzr5mVD9LjPNRTocw8HhgHTqDozFdl5u8ARwMnR8R/aDx1FfDFzPwt\nil8AnwDeRLEhvDoijomIw4ATgeMz83DgvwEfmuJtP7uP95HmoqOAD2XmrwM/B84BTgGOBP4EOAb4\nt5l5bGMb+UzjOb8UEf3ADcD7G31/DXB2RLxy1j5FD3OPvzMWA6uBgyLizxv3DQKHA3cDOzPz1sb9\nPwT+V2aOAkTEo8DBmfmtiFgHnBoRL6HYGBY3v0lEPG8f73Njpz6c1KYfZ+b3Gz//EHgiM/cAWyNi\nBHgSuCAi3gm8mCLURya18dLGY1dHRF/jvv2B3wa+3eH6e57B3xkTEyAdm5lPA0TEcuAXwBCwc9Lz\nd01uICJ+G7gZ2ADcDtwBfGrS0/r38T7SXPX0pNuT+/9JwLuBi4EvAv8CvHnSc/qB4cw8YuKOiDgE\neKLaUucnD/VUrw8YBb5FcaySiDgQ+AbF8fqJ50zneODuzPwEcCfwRp4JegAyc9s07yP1ot8DbsnM\nvwG+A/wBk/o+kMBTEfFmgIh4AXAfxeEiTcPgr97E3v6bgWMi4vvAN4HPZea1k56zr9dfCwxFxP8B\n7qH4U/fgiBic9Px9vY/Ua8YpBkWsiYh7KXZkHqI4L/ZLmbmLYgfnHY3nfRk4f/IJYE2tz2mZJale\nSh3jj4jvUJxwAfgx8BHgGmAMuC8z13ekOqmLIuJo4KLMPCEiDgcuBXZTHKN+S2ZuiYjTKK7d2AVc\n2HTSXpqzpj3UExGLADLzxMZ/f0RxwvG8zFwNLIgIjylrXomIs4ErKK6hgGLI7frMPBG4CXh/RBxK\ncRLyWOC1wEcjYr9u1CvNRJk9/lcAgxFxO8UJlvOBIzJzU+Px24CTKUagSPPFQxQn1D/buP2mzPx5\n4+eFwFMUFxjdlZm7gZGIeBB4OcUJSWnOKnNydwfw8cx8DfDHwOd49qiUbcCyDtQmdU1m3kRxWGfi\n9s8BIuI4YD3wF8BSnjkECrAdtwX1gDJ7/A9Q7P2QmQ9GxFbgiKbHlzDN2Nnx8fHxvr4yIxilGZu1\njhURbwLOBV6XmRMXGy1tesq02wK4PahjSneqMsH/duDfAesb88IsBf4xIlZn5h0Ul1pv3Gc1fX1s\n2bKtbE2lDQ0tqbzdTrTZqXattWh3NkTEqRQncddk5kS4fxv484gYAA6gmGvpvunaqnJ7qOp7rfLf\nZ763NRdrmmirrDLBfxXw6YjYRDGKZx2wFbiycSLrfpweQPNYRCwA/hJ4GLgpIsaBOzLzzxrTY99F\nsbd1XmZOvipbmnOmDf7GhRKnTvHQmsqrkeaQzHwYOK5xc/lennMVxc6R1DOcq0fSPu3Zs4fNm39a\n+vkjI4MMD4/+8vaKFYfR3z95xgV1k8EvaZ82b/4pf7rhVgYGp/yjZ592jm7lkjNfz8qVq6Z/smaN\nwS9pWgODy9l/6aHdLkMVcZI2SaoZg1+Sasbgl6SaMfglqWYMfkmqGUf1NEyMVZ48BrkMxylL6iUG\nf0OrY5Udpyyp1xj8TRyrLKkOPMYvSTVj8EtSzRj8klQzHuOXamC6GTb3NZrtkUc2d6osdYnBL9VA\nOzNsbt/yEIuHXtKBqtQtBr9UE62OWnt6+9YOVKNu8hi/JNWMwS9JNWPwS1LNeIy/TePjY6VGPbgO\nqaS5wuBv087RYTZcP8zAYPkhb87vI6mbDP4KOMePpF7iMX5JqhmDX5JqxuCXpJox+CWpZgx+SaoZ\nR/VIexERRwMXZeYJEfFi4BpgDLgvM9c3nnMacDqwC7gwM2/tVr1SWe7xS1OIiLOBK4BFjbs2AOdl\n5mpgQUSsjYhDgXcDxwKvBT4aEft1pWBpBgx+aWoPAW9sun1kZm5q/HwbcDLwSuCuzNydmSPAg8DL\nZ7dMaeYMfmkKmXkTsLvprr6mn7cBS4ElwJNN928HlnW+Oqk9HuOXyhlr+nkJ8AQwQvELYPL90xoa\nWlJZYWXaGhkZrOz9Zuqggwbb+ryz/V3NZjtVt1XWvAz+6ZaZm4rLy2ka342I4zPzTuAUYCNwN3Bh\nRAwABwAvA+4r09iWLdsqKWpoaEmptva2rOJsGB4ebfnzlv18s9nWXKxpoq2y5mXwt7LMnMvLaRpn\nAVc0Tt7eD9yYmeMRcSlwF8WhoPMyc2c3i5TKKBX8EXEIcA9wErCHKYa1zTUznTjN5eU0WWY+DBzX\n+PlBYM0Uz7kKuGp2K5PaM+3J3YhYCFwO7Gjc9ZxhbR2sT5JUsTKjei4GLgMepfhz9ohJw9pO6lBt\nkqQO2GfwR8Q64LHM/ArPDGdrfs02HL4mST1lumP8bwPGIuJk4BXAZ4Chpse7Mnxtuna7OXStrHaH\nuE2Yze91LrYpaeb2GfyN4/gARMRG4J3Ax6cY1jatqoYsNdvbUKhuDl0rq50hbhOqHArW6XY7Wauk\nmWllOOdzhrVVW5IkqZNKB39mnth0c031pUiSZoNz9UhSzRj8klQzBr8k1YzBL0k1Y/BLUs0Y/JJU\nMwa/JNWMwS9JNWPwS1LNGPySVDMGvyTVjMEvSTVj8EtSzRj8klQzBr8k1YzBL0k108oKXGrT+PgY\njzyyecavW7HiMPr7+ztQkaQ6Mfi7YOfoMBuuH2ZgsHz47xzdyiVnvp6VK1d1sDJJdWDwd8nA4HL2\nX3pot8uQVEMe45ekmjH4JalmPNQjlRQRC4G/BV4I7AZOA/YA1wBjwH2Zub5b9Ulluccvlfc6oD8z\nfxf4MPARYANwXmauBhZExNpuFiiVYfBL5T0ALIyIPmAZsAs4IjM3NR6/DTipW8VJZXmoRypvO7AK\n+BdgOfAG4NVNj2+j+IUgzWkGv1Te+4AvZ+b5EfGrwP8EBpoeXwI8UaahoaEllRVVpq2RkcHK3m+m\nDjposK3PO9vf1Wy2U3VbZRn8Unn/j+LwDhQBvxD4XkSszsw7gFOAjWUa2rJlWyUFDQ0tKdXW8PBo\nJe/XiuHh0ZY/b9nPN5ttzcWaJtoqy+CXyvsEcHVE3AnsB5wDfAe4MiL2A+4HbuxifVIpBr9UUmaO\nAm+a4qE1s1yK1BZH9UhSzRj8klQzBr8k1YzBL0k1Y/BLUs04qkdSx7S62tyEgw/+zQqr0YRpgz8i\nFgBXAEExA+E7gadxRkJJ02hltblnXruVqz88yNKlh3Sgsnors8f/BmA8M18VEaspZiTso5iRcFNE\nXBYRazPz5o5WKqknudrc3DPtMf5GoJ/euLkSGMYZCSWpZ5U6uZuZYxFxDXAp8HmKPf4JzkgoST2k\n9MndzFwXEYcAdwMHND1UakbCTs1AN1W73ZyJsJOmmuVwNr/XudimpJkrc3L3VGBFZl4EPEWx1Nw9\nM52RsKoZ6JrtbWa7bs5E2EmTZzmscma/Zp1ot5O1SpqZMnv8XwA+HRF3NJ7/HoqFKJyRUJJ60LTB\nn5k7cEZCSZo3vHJXkmr
"text/plain": [
8 years ago
"<matplotlib.figure.Figure at 0x7fd125d3ac88>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
8 years ago
"df.hist(column='Age', by='Sex')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems they follow a similar distribution. We can separate per passenger class."
]
},
{
"cell_type": "code",
8 years ago
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
8 years ago
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fd125d3d8d0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd1259fc6a0>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7fd1259c96d8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7fd1259829b0>]], dtype=object)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEFCAYAAADjUZCuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHsdJREFUeJzt3XuUnHWd5/F3dYcO0IQQsy1HiQSI8GVnXESDCyKQyMBy\nG404M7q7h9GIwILhfjlLYHGdYWHZEaLDqHg2EBCPXMRjhtsgCmEhMIoGkEnO6DdchNBBl5jt0EmH\n0Onu2j+ep6DS6aT7+T2/ujxVn9c5HLqq+vnWryv1rW89l+/vVyqXy4iIiHQ0egAiItIcVBBERARQ\nQRARkZQKgoiIACoIIiKSUkEQEREAJjV6ADIxZnYrsNLdFzV6LCKNYmanAZcCI8Bm4AJ3f6axo2od\nKghNzswOBr4NHA6sbPBwRBrGzA4C/hfwEXd/w8xOAn4MzGzsyFqHDhk1vwXAEuCHjR6ISIO9DZzh\n7m+kt58B9jYzfbGNRC9kk3P38wDM7LhGj0Wkkdz9VeDVqrsWAfe6+1CDhtRyVBBEpFDMbHfge8A+\nwIkNHk5L0SEjESkMM9sX+GdgEJjr7v0NHlJL0R6CiBSCmU0DHgeWuPvVjR5PK1JBKA5NSyvt7hxg\nBnCqmX02va8M/Jm79zVuWK2jpOmvRUQEcu4hmNl7gRXAccDuwAPA6vThm9z9nnzDEykG5YK0guCC\nkF77+12SbkGA2cAN7v6NGAMTKQrlgrSKPFcZXQ/cBLye3p4NnGJmj5vZzWbWnXt0IsWgXJCWEFQQ\nzGw+8Ia7/wwopf89DVzm7nOAl4GvRRqjSNNSLkgrCT1k9CVgxMyOBw4laRL5dFVL+VLgxvGClMvl\ncqlUChyCyA7V802lXJBmN+E3VlBBSL/5AGBmy4CzgfvM7Dx3/xXwZyTzjOx8lKUS69ZtDBnCdnp6\npjRdrGYcUzvE6umZEmE0E9OMuVAt5r9PLWPWKq7Gmi0fYvYhnA18y8wGgT8AZ0WMLVIkbZELw8PD\n9PauCd5+xox96ezsjDgiySt3QXD3Y6tuHpU3nkhRtVsu9Pau4ZJFD9LVPT3ztoMD67nh4lOYOXP/\nGoxMQqlTuY7yfKPq7++mu3u6vlFJU+nqns6ue+7d6GFIJDEb04aB20hWMlrl7gtyj67F6BtV61Iu\nSCuI2Zi2CLjC3Zeb2U1mNs/d740xyFaib1StR7mQXbk8wtq1vdvc19/fTV/fwIS21/mH2sizh1Bp\nxllIclnTR919efrYQ8DxgJJA2oFyIaPBgT4W3d1HV3fv+L+83bbaW66VoIJQ3YxjZlekd1c3uW0E\npuYcm0jTUy6E095y84nRmPZh4Hagp+rxKcCGiQSKec14M8aqjtPfn28Gg2nTumsyrlaNVSdNmQu1\njtvTMyX3+zmPLLlQtNe1kWI2pn3dzI5x9yeAk4BlE4nVbE1NMWONjjPR46M70tc3oIa5Ccapl2bM\nhWq1bKDK+37OY6K5oMa0xjWmXQosNrNdgN8AP4oYW6RIlAtSSLEb0+bmjSdSVMoFKbo801+LiEgL\nUUEQEREg/LLTDmAxYCTdmGcDXWjZQGlDygdpFaHnED4FlN39KDObA1wL3I+WDZT2pHyQlhB0yCht\nw69M6bsf0EeybOCfa9lAaTfKB2kVwecQ3H3EzG4D/h74AcmygZdq2UBpR8oHaQW5Ljt19/npLI+/\nBD7u7r9PH5rQsoHQvF2u6lQufqx6y5sPReuoVaeyOpUBMLPTgBnufh2wheRE2o/N7PwsywaCOpWz\nUKfyxOPUU6x8KFpHrTqV1alc8WPgVjN7PI1xAfAabbBsoMgYlA/SEkLnMtoMfH6Mh1p+2UCR0ZQP\n0irUmCYiIkDcxrS30bKB0oaUD9IqQvcQ3mnEAa4iacSpLBs4B+gws3mRxijS7JQP0hJiNKbNJGnE\nGb1s4HH5hyfS/JQP0iqC+xCqGnE+A/wVybqxFU2/bODw8DC9vWvGfGy8xb61wLeMVvR8EIF4jWm/\nAnareqjpl9B86aWXuGTRg3R1T8/0HIMD61ly9eeYNWtW5jGpMa2+seotbz4UrYFKjWlqTAPGbMQZ\nBlaY2Rx3f5wCLKHZ1zcQtMh3uTzCqlU+oaacadO23dNYu7Y303ONpsa0icepp1j5ULQGKjWmqTGt\nYnQjzvnAb4GbW33ZwMGBPhbd3UdXd/YP903rXmSPng/WYFTSYG2bD9JaYjemzc01moII2bMAeHvT\n+hqMRhqt3fNBWoca00REBMh5Ulnqp1weCT4HMTw8DJTo7Ezq/3hXUY2mq6pE2kPoSeVJwBKSxUC6\ngGtIJvPSkoE1kvfcRdfu0zJfUZU873puuPgUZs7cP/O27UC5IK0kdA/hNOCP7v4FM5sG/Br4G7Rk\nYE3lOXcRuq2MS7kgLSO0IPwQqHzj6QC2kiwZeLCZfQZ4AbjA3Rt3XZpIfSgXpGWETl2x2d0HzGwK\nSTL8N5JVorRkoLQV5YK0kuCTymb2AZLrr7/l7neZ2VR3fzN9uOmX0Gxkl2XRjNcV2u6dys2YC7WO\nq05ldSq/w8z2Bh4GFrj7Y+ndD5vZue6+ggIsodnILsui2VlXaLt3KjdjLlRTp7I6levRqbwQ2Au4\nysy+CpSBi4BvaslAaTPKBWkZoZ3KFwIXjvGQlgyUtqJckFaiTmUREQHUqSwiBZOla3+srnx13u9Y\nzE7lf0VryEqbUS7UX56ufXXe71zoIaNKd+YxwInAt9AastKelAsNUOm8z/pfyPQt7SS0IPyQZDFx\ngE5gCK0hK+1JuSAtI896CFR1Z14JXF/1K1pDVtpCK+TCztYXH0vluHzeFQCl+cTsVP67qoebfk1l\ndSpPnDqVd64ZcyFL3ND1xYu6AmCstcnVqZzaQXfmc2Z2jLs/QUHWVJaJUafyjjVjLlSbyGsaur54\nUVcAjLE2uTqVtzVWd+YFwD9oDVlpM8oFaRmxO5Xn5hqNSMEoF6SVqFNZRESAnJ3KZnY4cJ27f9LM\nDkXLBkqbUi5IK8hzldFlwF8Dm9K7ZqNlA1vOeNMEjDU1QLV2mCZAuSCtIs8ewovAqcD309uzgYO0\nbGBr0TQBE6JckJYQfA7B3ZeSdGVWPA1cpmUDW4+mCdg55YK0ipiznf5jo5cNVGNa88naBNToxpxI\ngnLhiaeWc/s/raJEKfMT7lru467F14752HivabvlghrTdixmQdASmrKdLE1ARWxM24GgXOjbMMDw\nHgdRKmUvCB1bfzfmazfRxrR2osa0HYtZEM4hacbRsoHS7pQLUki5CoK7vwocmf78HFo2UNqUckFa\ngRrTREQE0BKaIiITUj1N+Hj9N2MpQk9OzE7lWWjZQGlTyoXW19u7JmiacChOT07wIaO0O3MxMDm9\nS8sGSltSLrSPVu/JyXMOodKdWTFbywZKm1IuSEuI2alcffF00y8bKBKLckFaRcyTyiNVP2sJTQHa\ntlM5KBf2mLIrMBz0hLtM6tjha6dO5W2FdirnfZ0m8ryNfv/HLAjPaglNGa1NO5WDcmHTxi3ALkFP\nuHVoRJ3KE1Auj7BqlQf9zTub9XcixsuFVutUvhRYrGUDRZQLzSrP7L2b1r3IHj0frMGomkfMTuUX\n0LKB0qaUC8VRuVIoq7c3ra/BaJqLOpVFRASoQaeymT0DVKb+/Z27fzn2c4gUgXJBiiZqQTCzyQDu\nfmzMuDuSp5U87wkikZ2pdy6IxBB7D+HDQLeZPQx0Ale6+9ORn+MdeVrJ2+EEkTRUXXNBJIbYBWEz\n8HV3v8XMDgQeMrOD3H1kvA1D6QRR8yqXRzLtiVXv5RVhIrBx1D0XpHlNJBd2dpSjXvkQuyCsJmnj\nx91fMLP1wPuAtTvaIM814+3WUFM0oZf4DQ6sZ8nVn2PWrFk1GlldZM4FNaa1rjyXu9YzH2IXhNOB\nfwcsMLP3k3Rp/n5nG+RpxGinhpqiCt2Dy7PMYRM0pkFALuRpTBvcOsSKFf+y3f3Tpo1/bk3n0+oj\nNBegfvkQuyDcAtxqZst
"text/plain": [
"<matplotlib.figure.Figure at 0x7fd125af3a90>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
8 years ago
"df.hist(column='Age', by='Pclass')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see there are more young men in third class. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Pclass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have already seen how passengers are distributed with Pclass"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Pclass\n",
"1 216\n",
"2 184\n",
"3 491\n",
"dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Pclass').size()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9406ba58>"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAETBJREFUeJzt3X+sX3V9x/Hn9V4tvbftepFrZUUalyXvhZAtYUaWVm0p\nAv5iGMp0Wjtm1flrScU5g3HYynAxLBCGzrAVO5BhNnUdsQMRGRFF2Ypx07m4t45F6m3ZepEv9gdt\nKbd3f5zPlXvLvbffW+75ntve5yNp+v2e7/me+7r35t7X/Zwfn9M1MjKCJEnPazqAJGl2sBAkSYCF\nIEkqLARJEmAhSJIKC0GSBEBPnRuPiJXAF4EfAF3A94E/B26jKqNHgXWZeTgi1gIbgGFgc2ZuqTOb\nJGm8rjqvQyiF8P7MfNOYZVuAf8rMrRHxCWAHVUF8F3gZ8DTwEPDKzHyitnCSpHE6scuo66jnq4Bt\n5fE24ALgXGB7Zu7LzIPAA8CKDmSTJBW17jIqzoqIO4BTgauB3sw8XF7bDZwOLAGGxrxnqCyXJHVI\n3SOEHwObMvONwO8Dn2V8CR09ejjWcklSTWodIWTmLqqDymTm/0TE/wIvi4h5mXkIWArsBHYxfkSw\nFHhwqm0//fTwSE9Pdz3BJenkNekf3HWfZfRW4PTMvC4iXky1a+hvgMuA24E1wN3AduDmiFgEHAGW\nU51xNKlW68k6o0vSSWlgYOGkr9V9ltEC4PPAYuD5wCbge8DngHnAI8DbM3M4Ii4FPkxVCDdm5t9N\nte2hob1O0ypJ0zQwsHDSEUKthVAnC0Ga3YaHhxkc3NF0jDnhjDPOpLu7vV3oUxVCJ84ykjQHDQ7u\n4L6NH+W0+fObjnJSe+zAAVZ//BMsW/bS57wtC0FSbU6bP58X9/Y1HUNtci4jSRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRYCJKkwkKQJAEW\ngiSpsBAkSYCFIEkqLARJEmAhSJKKnro/QEScAvwAuBq4D7iNqogeBdZl5uGIWAtsAIaBzZm5pe5c\nkqTxOjFCuAr4WXl8NfCpzFwJPAysj4jess5q4DzgiohY3IFckqQxai2EiAjg14A7gS5gJbCtvLwN\nuAA4F9iemfsy8yDwALCizlySpGere4RwHfBBqjIA6MvMw+XxbuB0YAkwNOY9Q2W5JKmDajuGEBHr\ngG9n5iPVQOFZuiZaOMXycfr7e+np6T7eeJJqtmdPX9MR5oz+/j4GBhY+5+3UeVD59cBLI+JiYCnw\nFLAvIuZl5qGybCewi/EjgqXAg8faeKv15MwnljRjWq39TUeYM1qt/QwN7W1r3amKo7ZCyMzfHX0c\nER8DfgIsBy4DbgfWAHcD24GbI2IRcKSss6GuXJKkiXXqOoTR3UAbgcsj4n6gH7i1HEi+Erin/NuU\nme1VnSRpxtR+HQJAZn58zNMLJ3h9K7C1E1kkSRPzSmVJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEQE+dG4+I+cAtwBJgHnAN8D3gNqoyehRYl5mHI2ItsAEYBjZn5pY6s0mSxqt7\nhHAx8FBmrgLeDFwPXA18OjNXAg8D6yOiF7gKWA2cB1wREYtrziZJGqPWEUJmfmHM0zOBnwIrgXeX\nZduADwE/ArZn5j6AiHgAWAHcWWc+SdIzai2EURHxLWAp1Yjha5l5uLy0GzidapfS0Ji3DJXlkqQO\n6chB5cxcAfw2cDvQNealronfMelySVJN6j6ofA6wOzMHM/P7EdEN7I2IeZl5iGrUsBPYxfgRwVLg\nwam23d/fS09Pd13RJT1He/b0NR1hzujv72NgYOFz3k7du4xeBSyjOki8BFgAfAW4jGq0sAa4G9gO\n3BwRi4AjwHKqM44m1Wo9WWNsSc9Vq7W/6QhzRqu1n6GhvW2tO1Vx1L3L6CbgRRHxDaoDyO8FNgKX\nR8T9QD9wa2YeBK4E7in/NmVme5+dJGlG1H2W0UFg7QQvXTjBuluBrXXmkSRNziuVJUmAhSBJKiwE\nSRJgIUiSCgtBkgRYCJKkwkKQJAFtFkJE3DLBsq/OeBpJUmOmvDCt3LTmPcDZ5WrjUS+gmqFUknSS\nmLIQMvP2iPg61bxDG8e8dAT4zxpzSZI67JhTV2TmTmBVRPwScCrPTE29GHi8xmySpA5qay6jiPgL\nYD3VjWtGC2EE+JWackmSOqzdye1WAwNlsjpJ0kmo3dNOf2wZSNLJrd0RwmA5y+gB4OnRhZn5sVpS\nSZI6rt1C+Bnwz3UGkSQ1q91C+NNaU0iSGtduITxNdVbRqBHg58ALZzyRJKkRbRVCZv7i4HNEvAA4\nH/iNukJJkjpv2pPbZeZTmfkV4IIa8kiSGtLuhWnrj1r0EmDpzMeRJDWl3WMIrxzzeATYA7xp5uNI\nkprS7jGEtwNExKnASGa2ak0lSeq4dncZLQduAxYCXRHxM+BtmfmdOsNJkjqn3YPKnwQuycwXZeYA\n8Bbg+vpiSZI6rd1CGM7MH4w+ycx/Y8wUFpKkE1+7B5WPRMQa4Gvl+WuA4XoiSZKa0G4hvAf4FHAz\n1d3S/h14V12hJEmd1+4uowuBQ5nZn5kvpLpJzuvqiyVJ6rR2C+FtwKVjnl8IvHXm40iSmtJuIXRn\n5thjBiM8cytNSdJJoN1jCF+OiG8D36QqkfOBf6gtlSSp49q9UvmaiPg6cC7V6OB9mfkvdQabKcPD\nwwwO7mg6xpxwxhln0t3d3XQMScep3RECmfkA1S00TyiDgzv4kxu+xCkLTm06yknt4L7HueYDl7Fs\n2UubjiLpOLVdCCeyUxacSu+igaZjSNKsNu37IUiSTk61jxAi4lrgFUA31ZxID1FNlPc84FFgXWYe\njoi1wAaqK6A3Z+aWurNJkp5R6wghIlYBZ2XmcuC1wA3A1cCnM3Ml8DCwPiJ6gauA1cB5wBURsbjO\nbJKk8ereZXQ/8Dvl8RNAH7AS+HJZto3qVpznAtszc19mHqQ6eL2i5mySpDFq3WWUmSPAgfL0HcCd\nwEWZebgs2w2cDiwBhsa8dagslyR1SEfOMoqIS4D1VFNe/PeYlya72tmroCWpwzpxUPki4CNUI4O9\nEbE3IuZl5iFgKbAT2MX4EcFS4MGpttvf30tPz7Evgtqzp++4s2t6+vv7GBhY2HQMzRL+7HXOTP3s\n1VoIEbEIuBY4PzN/XhbfC6wBPl/+vxvYDtxc1j8CLKc642hSrdaTbWVotfYfV3ZNX6u1n6GhvU3H\n0Czhz17nTOdnb6riqHuE8GbghcAXIqKLatqLy4HPRsS7gUeAWzNzOCKuBO6hKoRNmelvFknqoLoP\nKm8GNk/w0oUTrLsV2FpnHknS5LxSWZIEWAiSpGJOTG6nE5fTl3eGU5cLLATNcoODO9i09RPM7/cU\nxrocaO1n06UfdepyWQia/eb399F3mtc3SHXzGIIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIk\nwEKQJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQIsBElS\nYSFIkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIkAHrq/gARcTZw\nB3B9Zn4mIs4AbqMqo0eBdZl5OCLWAhuAYWBzZm6pO5sk6Rm1jhAiohe4Ebh3zOKrgU9l5krgYWB9\nWe8qYDVwHnBFRCyuM5s
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9417f2b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most passengers are in 3rd class."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f9409a0f0>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF7VJREFUeJzt3X+U3XV95/HnZCZAMgnNAKMYY1IVfGOt6ImWE1GjEGtp\naxeq1NIqRS3uWcWtbO0vsVZslVo9REx/IdbIqlStUkGWH4WKPxCxx2TVZau8bXcrIQmriVxMMklg\nMpn9494hd4Yk853JfO6dyff5OGdO7vf3e+Y7mdf31+fz7RkdHUWSVE/zul2AJKl7DAFJqjFDQJJq\nzBCQpBozBCSpxgwBSaqxvtIbiIi1wCpgP3BpZm5ojV8KXAeMAj3A04A/zMxPl65JktRUNAQiYjVw\nSmaeGRGnAeuBMwEycytwVmu+XuBLwBdK1iNJGq/05aA1wA0AmXkfsCQiFh1kvtcB12fm7sL1SJLa\nlA6Bk4FtbcPbW+Mmuhj4aOFaJEkTdPrGcM/EERGxCvheZu7qcC2SVHulbwxvZfyR/1LgwQnzvAL4\n5yor27dvZLSvr3eGSpOk2njcAfiY0iFwO3A58JGIWAlsycyhCfP8HPCpKitrNLxlIElTNTi4+JDT\nil4Oysx7gI0RcTdwFXBJRFwUEee2zXYy8KOSdcwF69dfwwUXnMf69dd0uxRJNdIzl7qS3rZt59wp\ndgr27t3D61//m4yOjtLTM4+Pfew6jjtuQbfLknSUGBxcfMjLQbYYngWGh4cZC+PR0f0MDw93uSJJ\ndWEISFKNGQKSVGOGgCTVmCEgSTVmCEhSjRkCklRjhoAk1ZghIEk1ZghIUo0ZApJUY4aAJNWYISBJ\nNWYISFKNGQKSVGOGgCTVmCEgSTVW+h3DXTEyMsLmzZu6XUZlQ0PjX7v8wAOb6O/v71I1U7Ns2XJ6\ne3u7XYakaToqQ2Dz5k388VWf47hFJ3S7lEpGRx4dN7z2k3fR03tMl6qpbu+uh3jPpeezYsVTu12K\npGk6KkMA4LhFJ7Dw+MFul1HJ/n172dU2vGDxiczrO65r9UiqD+8JSFKNGQKSVGOGgCTVWPF7AhGx\nFlgF7AcuzcwNbdOWAZ8C5gP/MzPfXLoeSdIBRc8EImI1cEpmnglcDKybMMuVwAcycxUw0goFSVKH\nlL4ctAa4ASAz7wOWRMQigIjoAV4E3NSa/l8zc3PheiRJbUqHwMnAtrbh7a1xAIPALuCqiLgrIq4o\nXIskaYJOtxPomfD5ycAHgU3AzRHxi5l566EWHhhYSF/f5K1Td+yYG61tjwYDA/0MDi7udhmSpql0\nCGzlwJE/wFLgwdbn7cAPMvMHABHxReBZwCFDoNHYXWmjjcbQ5DNpRjQaQ2zbtrPbZUg6jMMdqJW+\nHHQ7cD5ARKwEtmTmEEBmjgD/NyKe3pr3eUAWrkeS1KbomUBm3hMRGyPibmAEuCQiLgIezswbgf8G\nXNu6SXxvZt5Ush5J0njF7wlk5mUTRt3bNu3/AC8uXYMk6eBsMSxJNWYISFKNGQKSVGOGgCTVmCEg\nSTVmCEhSjRkCklRjhoAk1ZghMBv0tHeK1zNhWJLKMQRmgXm981kw+EwAFgyexrze+V2uSFJddLor\naR3C8ctfwPHLX9DtMiTVjGcC0gxYv/4aLrjgPNavv6bbpUhTYghIR2jv3j3ccUfzNRh33HEbe/fu\n6XJFUnWGgHSEhoeHGR0dBWB0dD/Dw8NdrkiqzhCQpBozBCSpxgwBSaoxQ0CSaswQkKQaMwQkqcYM\nAUmqMUNAkmrMEJCkGivegVxErAVWAfuBSzNzQ9u0/wA2taaNAq/JzAdL1yRJaioaAhGxGjglM8+M\niNOA9cCZbbOMAudkpp2tSFIXlL4ctAa4ASAz7wOWRMSituk9rS9JUheUDoGTgW1tw9tb49pdHRF3\nRcQVhWuRJE3Q6ZfKTDzqfydwG/AQcGNEvDIz//FQCw8MLKSvb/JXL+7Y0X9ERaq6gYF+BgcXd7uM\nrjrmmP3jhk88cRE/9VP1/plo7igdAlsZf+S/FHjsxm9mfnLsc0TcAjwbOGQINBq7K2200Riaap2a\npkZjiG3bdna7jK7auXPXuOEf/3gXjz7qg3eaPQ53oFb6N/V24HyAiFgJbMnModbw8RFxW0SMvVD3\nJcD/LlyPJKlN0TOBzLwnIjZGxN3ACHBJRFwEPJyZN0bEzcA3ImI38K3MvL5kPZKk8YrfE8jMyyaM\nurdt2l8Cf1m6BknSwXnhUpJqzBCQpBozBCSpxjrdTkCa1MjICJs3b+p2GZUNDY1/JPmBBzbR3z/7\n26osW7ac3t7J293o6GYIaNbZvHkTl//je1kwMPv/kALsf3Rk3PC6r13NvGNm9x/XPY0hLn/lO1ix\n4qndLkVdZghoVlow0E//SXOj1e3II/t4uG144YmL6D3W/1qaG7wnIEk1VvlwJSKeCKxoDd6fmT8s\nU5IkqVMmDYGIeDXwduBJwAOt0csjYgvw55n52YL1SZIKOmwIRMS1rXlel5nfmTDtOcDvR8QvZ+br\nilUoSSpmsjOBz2fmjQeb0AqF10bEuTNfliSpEyYLgee2jvgPKjP/9FAhIUma/SYLgbHpp7a+vgr0\n0uz2+VsF65IkdcBhQyAz3wkQEV8AzsjMkdbwfOAz5cuTJJVUtZ3Acsa/GnKUA4+LSpLmqKrtBG4G\nvh8RG4H9wErghmJVSZI6olIIZOY7Wo+LPpvmGcG7M/O7JQuTJJVX6XJQRBwLvJzmfYHrgcURcVzR\nyiRJxVW9J/A3wNOBs1rDK4FrSxQkSeqcqiFwWmb+LrAbIDP/FlharCpJUkdUDYF9rX9HASKiH1hQ\npCJJUsdUDYHPRsQXgadFxDrg28B15cqSJHVC1aeD/ioi/gV4KfAIcEFmbixZmCSpvEohEBHfAD4O\nfDQzH5rKBiJiLbCKZvuCSzNzw0Hm+XNgVWaeNXGaJKmcqpeD3gacBnwrIm6MiPMj4pjJFoqI1cAp\nmXkmcDGw7iDzPBN4Ma37DZKkzqkUApl5d2b+DvDTwAeBc4AtFRZdQ6tlcWbeByyJiEUT5rkSuKxq\nwZKkmTOV10suAc4Dfg14GvDhCoudDLRf/tneGvfvrXVeBHwJuL9qHZKkmVP1nsA/Ac+ieVT/3sz8\n+jS391gndBExALye5tnCUxjfQd1BDQwspK+vd9KN7NjRP83yNFUDA/0MDi6e0XXOtf3XM6/tV7dn\nwvAsVmLfae6peibwIeC2zNw/xfVvpXnkP2Yp8GDr89nAScBdwHE0Hz+9MjPfdqiVNRq7K2200Ria\nYpmarkZjiG3bds74OueSefN7WfSME9j1/YdYdOoJzJs/+YHKbFBi32l2OlzYT/aO4Q9l5ltpvmj+\njyJi3PTMXD3Jtm8HLgc+EhErgS2ZOdRa9nrg+tZ2VgAfO1wASLPZwBlLGTjDRvSaeyY7E1jf+veP\np7PyzLwnIjZGxN3ACHBJ6z7Aw76WUpK6b7I3i32n9fEvaLYT+PRU2wlk5sQnf+49yDz307w8JEnq\noKr3BN4G/DrNdgLfBj4BfCEzHy1WmSSpuNLtBCRJs1jpdgKSpFlsqu0EPs+RtROQJM0iVc8EvgL8\nUmaOlCxGktRZVTuQe5kBIElHn6pnApsi4svAN4DHngjKzD8pUZQkqTOqhsB/tL4kSUeRqiHwZ0Wr\nkCR1RdUQ2Mf4l76MAj8BTpzxiiRJHVP1HcOP3UBuvVFsDfCcUkVJkjqj6tNBj8nMRzPzVuDnC9Qj\nSeqgqo3F3jBh1FOAJ898OZKkTqp6T+DFbZ9HgR3Aq2e+HElSJ1W9J/D6sc+tPoR+kpmjh1lEkjQH\nHPaeQEScHhGfbRu+juYrI7dGxBmli5MklTXZjeF1NF8mQ0SsBl4APJHm00FXlC1NklTaZCEwLzNv\nan3+FZpvFtuZmd8FesqWJkkqbbIQGG77fBbw5SksK0ma5Sa7MbwnIs4FjgeWA18CiIgAegvXJkkq\nbLIQeCvwt8AA8JuZORw
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f9405ffd0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Survivors per class\n",
"sns.barplot(x='Pclass', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"As expected, passenger class is very significant, since most survivors are in first class.\n",
"\n",
"We can also see the distribution of classes per sex."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f2f94db5400>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWgAAAEZCAYAAAC6m7+xAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF7lJREFUeJzt3X2QXXWd5/F3k9aEJEQ6GJAJkmV06rtSrlOLjMwmSMKD\nMM4guIBSY2BYHhwdRxef1o01MmQcnGVxpRAfFieAIMNUiWyWkeVhgBFBhCK4oyJT7ldFF0hwoZGL\nSRoSQtL7xzkNl7bTaZJ7+vyafr+quvrc83S/fUN97o/fOb/f6RseHkaSVJ7d2i5AkjQ2A1qSCmVA\nS1KhDGhJKpQBLUmFMqAlqVD9TZ48InYHrgD2AWYC5wEnAW8Gnqh3+2xm3hQRy4Gzga3Aqsy8vMna\nJKl0fU3eBx0R7wb2z8z/FhH7A7cC3wWuzcwbu/abDfwzcDDwHHAf8NbMfKqx4iSpcI22oDPzmq6X\n+wOP1Mt9o3Y9BFiTmRsBIuIuYAlwQ5P1SVLJGg3oERHxXWAhcCzwMeDPI+KjwGPAh4DXAINdhwwC\n+05GbZJUqkm5SJiZS4DjgKuBrwErMvNI4AfAyjEOGd3ClqRpp9GAjoiDImI/gMy8n6rF/qN6GeB6\n4I3AOl7cYl4IPDreuZ97busw4I8//kztH42j6S6Ow4BFwEciYh9gLvCViPh4Zv4CWAY8AKwBLo2I\necA2YDHVHR3b1ek83WTd0i7bunUra9c+3GoN++23PzNmzGi1hvEsWLBH2yUUremAvgS4LCLuBGYB\nHwA2Al+PiKF6+fTM3BQRK4BbqAJ6ZWZuaLg2qVFr1z7Mpy66lllz57fy/ps2Psl5Hz6JRYsOaOX9\nteuavotjE7B8jE1vGWPf1cDqJuuRJtusufOZPW9B22VoinIkoSQVyoCWpEIZ0JJUKANakgplQEtS\noQxoSSqUAS1JhTKgJalQBrQkFcqAlqRCGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXK\ngJakQhnQklQoA1qSCmVAS1KhDGhJKpQBLUmFMqAlqVD9TZ48InYHrgD2AWYC5wE/BK6i+nL4JXBq\nZm6JiOXA2cBWYFVmXt5kbZJUuqZb0O8A7svMZcDJwIXAp4EvZuZS4EHgjIiYDZwDHAEcDnwkIvZs\nuDZJKlqjLejMvKbr5f7AI8BS4H31uuuBjwM/AdZk5kaAiLgLWALc0GR9klSyRgN6RER8F1hI1aK+\nNTO31JseB/al6gIZ7DpksF4vSdPWpFwkzMwlwHHA1UBf16a+sY/Y7npJmjaavkh4EPB4Zq7NzPsj\nYgawISJmZuZmqlb1OuBRXtxiXgjcM965BwZm098/o6nSpV22fv2ctktgYGAOCxbs0XYZ2klNd3Ec\nBiyiuui3DzAXuAk4iao1fSJwM7AGuDQi5gHbgMVUd3RsV6fzdINlS7uu0xlquwQ6nSEGBze0XcZ2\n+eUxvqa7OC4B9o6IO6kuCP4ZcC5wWkTcAQwAV2bmJmAFcEv9szIzy/2vSpImQdN3cWwClo+x6egx\n9l0NrG6yHkmaShxJKEmFMqAlqVAGtCQVyoCWpEIZ0JJUKANakgplQEtSoQxoSSqUAS1JhTKgJalQ\nBrQkFcqAlqRCGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXKgJakQhnQklQoA1qSCmVA\nS1KhDGhJKlR/028QERcAhwIzgPOB44A3A0/Uu3w2M2+KiOXA2cBWYFVmXt50bZJUskYDOiKWAQdm\n5uKImA98H/gnYEVm3ti132zgHOBg4DngvohYnZlPNVmfJJWs6S6OO4B31ctPAXOoWtJ9o/Y7BFiT\nmRszcxNwF7Ck4dokqWiNtqAzcxh4pn55FnADVRfGByPio8BjwIeA1wCDXYcOAvs2WZskla7xPmiA\niDgeOB04mqob41eZeX9EfAJYCdw96pDRLezfMDAwm/7+Gb0uVeqZ9evntF0CAwNzWLBgj7bL0E6a\njIuExwCfBI7JzA3A7V2brwe+DHwDeEfX+oXAPeOdt9N5useVSr3V6Qy1XQKdzhCDgxvaLmO7/PIY\nX6N90BExD7gAODYzf12vuzYiDqh3WQY8AKwBDo6IeRExF1gMfKfJ2iSpdE23oE8G9gKuiYg+YBj4\nKvD1iBgCNgKnZ+amiFgB3AJsA1bWrW1Jmraavki4Clg1xqarxth3NbC6yXokaSpxJKEkFcqAlqRC\nGdCSVCgDWpIKZUBLUqEMaEkqlAEtSYUyoCWpUAa0JBXKgJakQhnQklQoA1qSCmVAS1KhDGhJKpQB\nLUmFMqAlqVAGtCQVyoCWpEIZ0JJUKANakgplQEtSoQxoSSqUAS1JhTKgJalQ/U2/QURcABwKzADO\nB+4DrqL6cvglcGpmbomI5cDZwFZgVWZe3nRtklSyRlvQEbEMODAzFwNvBy4CPg18MTOXAg8CZ0TE\nbOAc4AjgcOAjEbFnk7VJUuma7uK4A3hXvfwUMAdYCnyzXnc98DbgEGBNZm7MzE3AXcCShmuTpKI1\n2sWRmcPAM/XLM4EbgGMyc0u97nFgX2AfYLDr0MF6vSRNW433QQNExPHAGcDRwM+6NvVt55DtrX/e\nwMBs+vtn9KA6qRnr189puwQGBuawYMEebZehnTQZFwmPAT5J1XLeEBEbImJmZm4GFgLrgEd5cYt5\nIXDPeOftdJ5uqmSpJzqdobZLoNMZYnBwQ9tlbJdfHuNr+iLhPOAC4NjM/HW9+jbgxHr5ROBmYA1w\ncETMi4i5wGLgO03WJkmla7oFfTKwF3BNRPQBw8BpwGUR8T7gIeDKzNwaESuAW4BtwMrMLPdrX5Im\nQdMXCVcBq8bYdPQY+64GVjdZjyRNJY4klKRCTSigI+KKMdb9Y8+rkSQ9b9wujnr49fuBN0bEnV2b\nXkl177IkqSHjBnRmXh0R3wauBs7t2rQN+JcG65KkaW+HFwkzcx2wLCJeBcznhUEkewJPNlibJE1r\nE7qLIyI+TzUScJAXAnoY+O2G6pKkaW+it9kdASyoJzKSJE2CiQb0Tw1nSVNVROwGfIHq5oYtwADw\nscws+lraRAN6bX0Xx13AcyMrM/MvG6lKknrrTcBrM/M4gIh4PfD6iDiFau6fVwL/Hbgf+AeqwXT/\nDviTzDy9nZInHtC/Av6pyUIkqUH/AmyKiMuAO6nm+hkEfjszT46I3YHbM/P3I+KvqZ7+9AZemDeo\nFRMN6L9utApJalA9B/27I2I+1QNC/gp4M1VoX05188Nz9b63RsTfANdm5sa2aoaJB/RzVHdtjBgG\nfk01EZIkFS0ilgJ71XP+3BQR9wO/AK7KzDPrff51/Xs5cB1wZET8XX2rcSsmFNCZ+fyQ8Ih4JXAk\n8LtNFSVJPfYD4EsRcRqwierxe28D3l5PZfEq4OaI2ACcTtUH/U3gb4E/aqVidmI2u8x8luob6ONU\n/TSSVLR6PvpTxth0xxjrjqp//4gWwxkmPlDljFGrXkt15VOS1JCJtqDf2rU8DKwH3t37ciRJIyba\nB306QH0FdDgzO41WJUmacBfHYuAqYA+gLyJ+BZySmd9rsjhJms4m+kSV84HjM3PvzFwA/DFwYXNl\nSZImGtBbM/OBkReZ+X26hnxLknpvohcJt0XEicCt9es/ALY2U5Kk6WrhwoUzgNf1+LQPrlu3rud5\nFRFfBb6RmTf2+twjJhrQ76eaCepSqqep/AB4b1NFSZq2XvdvDj8zZ82d35OTbdr4JD+6/bIAftKT\nE06yiQb00cDmzBwAiIhvAX8IfLGpwiRNT7Pmzmf2vAWT+p71CMOlwKuBA4FPUV1rewPVAJeTgd8D\nZgGXZOblXcfuRjXi8ADgFcC5mXl7L+qaaB/0KcAJXa+PBt7TiwIkqRCvr6cjPR9YAbyzXj4d+EVm\nHgYcxm9OHvce4NHMPBL498BFvSpooi3oGZnZ3YczzAuPvhpXRLyRauKRCzPzy3W/zZuBJ+pdPpuZ\nN9UTlJxN1be9qvsbSpI
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8ffd6198>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.factorplot('Pclass',data=df,hue='Sex',kind='count')"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Pclass Sex \n",
"1 female 0.968085\n",
" male 0.368852\n",
"2 female 0.921053\n",
" male 0.157407\n",
"3 female 0.500000\n",
" male 0.135447\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Pclass', 'Sex']).Survived.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most women in first class and second survived, 96% and 92% respectively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Fare"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to analyse the feature *Fare* and will take the opportunity to introduce how to manage outliers.\n",
"\n",
"As we see in the PairGrid chart, Fare is directly related to the Passenger class."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8ff4c1d0>"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEDCAYAAADZUdTgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFHZJREFUeJzt3X+QXWd93/H3SpGFtVqNZM0ixLoYOs18PZ5MZko1YypR\n60djuxSICRKBiUZ1ImiVjpNRjXHH+UO2UdNCSa1SG/KjCgLhkCnQUR00HhzjcXBlcCOnzNgwSb4Y\nG+R4JdC1R45XxpIvu9s/7jFdCUl7dffes7v3eb9mPHPuc87V83yt1ed59vy4d2BychJJUhkWzPYA\nJEn1MfQlqSCGviQVxNCXpIIY+pJUEENfkgryc9MdEBGDwOeBFcAlwG7gh8AfABPAk5l5U3XsrcCW\nqn13Zn61R+OWJHWgnZX+rwN/m5mbaAX6fwP+K/DbmfnPgOURcX1EvBn4VWAt8G5gT0QM9GTUkqSO\ntBP6zwMrq+2VwAvAWzLzW1XbQeBaYCPw1cwcz8zngR8AV3V3uJKkmZg29DPzi8AVEfEU8HXgVuDE\nlEOOA6uBVUBjSnujapckzRHThn5EbAWOZObPA5uAPznrkPOdwvHUjiTNMdNeyAXWAX8OkJnfjohL\nz3rfCDAKHAWuPKv96IX+4MnJycmBAecGSbpIHQdnO6H/PeBtwP+KiCuAMeD7EbEuM78BvBe4G3gK\n+HBE3A68HnhjZv71BUc9MECjMdbp2Oe84eEh65un+rk2sL75bnh4qOP3thP6fwTsi4ivAwuBHbRu\n2fzv1d05f5mZDwNExF7gEK1bNn+z41FJknpi2tDPzJeB959j1zXnOPbTwKe7MC5JUg/4RK4kFcTQ\nl6SCGPqSVBBDX5IKYuhLUkEMfUkqiKEvSQUx9CWpIIa+JBXE0Jekghj6klQQQ1+SCmLoS1JBDH1J\nKkg7n6ffM3d8/A8Zn1xUW39LL13IB7durq0/SZprZjX0/+qZV1kwVN93pw8PPFtbX5I0F3l6R5IK\nYuhLUkEMfUkqyLTn9CNiO7ANmAQGgH8CvB34A1pfgP5kZt5UHXsrsKVq352ZX+3RuCVJHWjni9H3\nAfsAIuIa4H3AJ4HfzsxvRcQXIuJ6IIFfBd4GrAAORcQDmTnZs9FLki7KxZ7euR34z8CbM/NbVdtB\n4FpgI/DVzBzPzOeBHwBXdWugkqSZazv0I2IN8CwwDpyYsus4sBpYBTSmtDeqdknSHHExK/0PAZ+r\ntgemtA/87KEXbJckzZKLeThrA/Bb1fbKKe0jwChwFLjyrPajMxlcty1atIDh4aFa+6y7v7r1c339\nXBtYX6naCv2IWA2MZeZPqtd/ExFrM/ObwHuBu4GngA9HxO3A64E3ZuZf92jcHWk2J2g0xmrrb3h4\nqNb+6tbP9fVzbWB9891MJrR2V/qraZ27f83NwB9FxADwl5n5MEBE7AUO0bpl8zc7HpUkqSfaCv3q\nTp13Tnn9N8A15zju08CnuzY6SVJX+USuJBXE0Jekghj6klQQQ1+SCmLoS1JBDH1JKoihL0kFMfQl\nqSCGviQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4kFcTQl6SCGPqSVJC2viM3\nIrYCtwJN4Hbg28C9tCaNY8C2zGxWx+0ExoG9mbmvJ6OWJHVk2pV+RFxGK+jXAu8C3gPsBu7JzPXA\n08D2iFgC7AI2ARuBmyNiea8GLkm6eO2s9H8J+Fpm/hj4MbAjIp4BdlT7DwIfAb4LHM7MkwAR8Siw\nDri/66OWJHWkndB/MzAYEX8GLAc+CizJzGa1/ziwGlgFNKa8r1G1S5LmiHZCfwC4DPgVWhPAX1Rt\nU/ef731zyqJFCxgeHqq1z7r7q1s/19fPtYH1laqd0P8R8M3MnACeiYgxoBkRizPzNDACjAJHOXNl\nPwI81u0Bz0SzOUGjMVZbf8PDQ7X2V7d+rq+fawPrm+9mMqG1c8vmg8CmiBiIiJXAUuAhYEu1fzPw\nAHAYWBMRyyJiKa0Lv4c6HpkkqeumDf3MPAr8T+D/0LooexNwB3BjRDwCrAD2Z+Yp4DZak8SDwJ2Z\n2b9TrSTNQ23dp5+Ze4G9ZzVfd47jDgAHujAuSVIP+ESuJBXE0Jekghj6klQQQ1+SCmLoS1JBDH1J\nKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4kFcTQl6SC\nGPqSVJBpvyM3ItYDXwa+AwwATwK/B9xLa9I4BmzLzGZEbAV2AuPA3szc16uBS5IuXrsr/a9n5qbM\n3JiZO4HdwD2ZuR54GtgeEUuAXcAmYCNwc0Qs78moJUkdaTf0B856vQE4WG0fBK4FrgYOZ+bJzDwF\nPAqs68YgJUndMe3pncpVEXEfcBmtVf6SzGxW+44Dq4FVQGPKexpVuyRpjmgn9J8C7szML0fEPwT+\n4qz3nf1bwHTts2bRogUMDw/V2mfd/dWtn+vr59rA+ko1behn5lFaF3LJzGci4ofAmohYnJmngRFg\nFDjKmSv7EeCx7g+5c83mBI3GWG39DQ8P1dpf3fq5vn6uDaxvvpvJhDbtOf2I+LWIuKXafgOt0zif\nBbZUh2wGHgAO05oMlkXEUmAtcKjjkUmSuq6d0ztfAf40Im4AFgE7gCeAz0fEvwGOAPszczwibgMe\nBCZonRLq36lWkuahdk7vnAR++Ry7rjvHsQeAA10YlySpB3wiV5IKYuhLUkEMfUkqiKEvSQUx9CWp\nIIa+JBXE0Jekghj6klQQQ1+SCmLoS1JBDH1JKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi\n6EtSQQx9SSpIO1+MTkS8DvgOsBt4GLiX1oRxDNiWmc2I2ArsBMaBvZm5rzdDliR1qt2V/i7ghWp7\nN3BPZq4Hnga2R8SS6phNwEbg5ohY3u3BSpJmZtrQj4gArgTuBwaA9cDBavdB4FrgauBwZp7MzFPA\no8C6noxYktSxdlb6dwEfphX4AIOZ2ay2jwOrgVVAY8p7GlW7JGkOueA5/YjYBnwzM4+0Fvw/Y+Bc\njRdon1WLFi1geHio1j7r7q9u/VxfP9cG1leq6S7kvhN4S0S8GxgBXgVORsTizDxdtY0CRzlzZT8C\nPNaD8c5IszlBozFWW3/Dw0O19le3fq6vn2sD65vvZjKhXTD0M/MDr21HxO3AD4C1wBbgC8Bm4AHg\nMPDHEbEMmKiO2dnxqCRJPXEx9+m/dsrmDuDGiHgEWAHsry7e3gY8WP13Z2b27zQrSfNUW/fpA2Tm\nR6e8vO4c+w8AB7oxKElSb/hEriQVxNCXpIIY+pJUEENfkgpi6EtSQQx9SSqIoS9JBTH0Jakghr4k\nFcTQl6SCGPqSVBBDX5IKYuhLUkEMfUkqiKEvSQUx9CWpIIa+JBXE0Jekghj6klSQab8jNyIuBT4H\nrAIWA78LPAHcS2vSOAZsy8xmRGwFdgLjwN7M3NejcUuSOtDOSv/dwOOZuQF4P7AH2A18KjPXA08D\n2yNiCbAL2ARsBG6OiOU9GbUkqSPTrvQz80tTXr4J+DtgPbCjajsIfAT4LnA4M08CRMSjwDrg/m4O\nWJLUuWlD/zUR8Q1ghNbK/2uZ2ax2HQdW0zr905jylkbVLkmaI9oO/cxcFxG/CHwBGJiya+A8bzlf\n+6xZtGgBw8NDtfZZd3916+f6+rk2sL5StXMh963A8cx8LjOfjIiFwFhELM7M07RW/6PAUc5c2Y8A\nj/Vi0J1qNidoNMZq6294eKjW/urWz/X1c21gffPdTCa0di7kXgPcAhARq4ClwEPAlmr/ZuAB4DCw\nJiKWRcRSYC1wqOORSZK6rp3Q/0Pg9RHxv2ldtP23wB3AjRHxCLAC2J+Zp4DbgAer/+7MzP6daiVp\nHmrn7p1TwNZz7LruHMceAA50YVySpB7wiVxJKoihL0kFMfQlqSCGviQVxNCXpIIY+pJUEENfkgpi\n6EtSQQx9SSqIoS9JBTH
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8ffa4ba8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df['Fare'].hist()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8feb4160>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8fe02e48>]], dtype=object)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEMCAYAAADHxQ0LAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHP9JREFUeJzt3XuQVeWZ7/FvC9JIAwViBwlGYmoyj0PNZOo4VJkBR5CJ\nOJbXBBytUIwJyYTJMRnG25SZDF445iTHHJgcNRlzUKJxtCY6RYyEaNAyGm8nGJ0RPck8KgqES6Ql\njVyE7qa7zx/v28fdm+7ea2/W2pdev09VV++9Lu96evXaz3r3u9613qbe3l5ERGT4O6bWAYiISHUo\n4YuI5IQSvohITijhi4jkhBK+iEhOKOGLiOTEyFoHkDdm1gO8ARyOk5qAze5+bu2iEklH0fE9AtgD\nfMXdnxhindnAne7+0epEmV9K+NXXC8x29521DkQkA/2ObzObCaw1s993990l1pOMKeFXX1P8OYKZ\nGXAnMInwv7ne3f81zusB/gG4HJgO/AHwHWAKcAhY7O4vZh69yND6Hd/u/pyZvQH8KfBjM/sr4KuE\nBP8L4K8LVzaz44C7gT8GjgXWuPu1cd4lwPWEbw6dwN+6+88Hm57lH9mo1IZfX74JPOzu04HPAXeZ\n2YjCBdz9D+LLHwJ3u7sBfwP8yMz0/5R6dCzQYWbTCMf4me5+KtACfLlo2S8CLXH+acBn4rcEgG8D\n58bPx38FLiwxXYooQdTGk2b2q4Kf7wK4+4XAirjMs8BoQg2+z4/j71OBVne/O673PNAGzESkjpjZ\nucBkwvE8D3jW3d+Osz8N/FPh8u6+Erg4vn4X+L/AR+Lst4EvmtnJ7v6cu19TYroUUZNObQzYhh8/\nHF81sxN4v02z8KT8u/h7AtBiZr+K75uAcYSmIJFae9LMDhOO3c3AX7j7e/G43tO3kLt3AoSWzMDM\nPgqsiM2bPcBJwOo4+0JgGfCimW0FroxNN4NNlyJK+LVxRBu+mY0EHgAWuPtPzWwUcJCBL2btAN6N\nX2FF6s1gnRLeIbTlA2Bm44Djipa5Hfhl/LaLmT3TN8Pd3wIWx+mXA/cDJw02PbW/ZhhRk079aAHG\nAH0XXv8O6CDU3Ptx9y3ANjObD2BmJ5jZ/fGCl0itDdgpAfgJMMvMTjazJuAOYqIu8AHg3wHM7Gzg\n94Cx8RhfH08SEC749pjZpIGmp/nHDCdK+NU3YPez2F55C/AfZvYi8DrwEKFnw5gB1rsM+JKZ/Rp4\nEnjM3Q9mFrVIMoN2r3T37cAXgJ8B/wl0AyuLFrsZWGlmG4E/A26KPx8FHgFeMLNXCbX4xbGr56PF\n01P9i4aRplLPwzezFuD7wERgFLAc+C3wz4Qz6UZ3vyIuey2wIE5f7u6PZBe6SDbijUAPAq8Saqsb\nCb1L7iVUknYCi9y9y8wWAksJyWuVu68euFSR2kuS8K8APujuXzWzEwln5x3Ate7+kpndRzghOOFD\n8nHCyeFpYLq764YKaSgx4V/h7n9ZMG018GN3X2NmXwO2Ek4ALwEzCHeWvgD8mbvvGaBYkZpL0qTz\nDu/3/pgE7AZOcfeX4rS1wNnAWcAj7t7t7u8Qrs7roqI0quJ26DmEYx3eP+ZPBza4+353PwQ8A8yq\nWoQiZSqZ8N39B8A0M3ud0FZ8LdBesMguQl/xyYS+4H3a6N+HXKSRTDezh8zs52b2CWCMu3fFeTrm\npSGVTPixjXJLfLDRXOBfihYZ7Ir8YNNF6t3rwI3ufjHwGeAu+ndh1jEvDSlJk84s4KcA7v4Kod/s\nCQXzpwLbCe36U4qm7xiq4N5wAUE/+jnan1S5+w53fzC+fpPQSWGimTXHRXTM66fWPxVJcuPVG4QL\nsT+Mz8LYB7xlZrPc/VngU8CthFrRVWZ2PaEv7Qfd/VeDFQrQ1NREW9u+SmMvqbV1XKblV2MbKj/Z\nNtJkZp8Gprj7ithRYTLwPUIPtPuA+YSugBuAO81sPKFn2kxCj51BZX3MV6Ia/6NyKJ7SKj3mkyT8\n7wKrzexJwtPolhBqPP873jzxi75nXZvZKkLvnB7CA71EGtHDwP1mdhHhwV9LgJeB75vZF4AtwD3u\n3m1m1wHrCcf8je5eX5lBpEDJbpkZ6x0OtctG/hsavfy4jUZqO8/0mK9EvdVg8xBPd3c327ZtrXj9\nGTM+VtExr2fpiIhU2bZtW7l65TpGtZT/vMPOA7t56v6PVbRdJXwRkRoY1TKJ0eMnV3WbepaOiEhO\nKOGLiOSEEr6ISE4o4YuI5IQSvohITijhi4jkhBK+iEhOKOGLiOSEEr6ISE4o4YuI5IQSvohITijh\ni4jkRE0fnnbDN+6gu/fYxMt/6MQJfOr8eRlGJCIyfNU04f/yzU6OGZd8zOdDnW9nGI2IyPCmJh0R\nkZxQwhcRyQklfBGRnCjZhm9mi4FFQC/QBPwJcAbwz4SBmze6+xVx2WuBBXH6cnd/JKO4RUSkTCUT\nvruvBlYDmNmZwCXAt4Avu/tLZnafmZ0DOPCXwMeBicDTZvaou9d0lHQREQnKbdK5HvgfwIfd/aU4\nbS1wNnAW8Ii7d7v7O8BmYHpagYqIyNFJnPDNbAawFegG2gtm7QKmAJOBtoLpbXG6iIjUgXJq+J8H\n7o6vmwqmNx256JDTRUSkBsq58WoO8KX4elLB9KnAdmAHcGrR9B1HE1yx5uaRtLaOK2udcpevRNbb\nUPkikoZECd/MpgD73P1wfP9rM5vp7s8BnwJuBV4HrjKz64EPAB9091+lGWxHx2Ha2vYlXr61dVxZ\ny1ci622o/GTbEJHSktbwpxDa6vtcCXzXzJqAX7j7EwBmtgp4mtAt82/SDFRERI5OooQfe+ScV/D+\n18CZAyz3beDbqUUnIiKp0Z22IiI5oYQvIpITSvgiIjmhhC8ikhNK+CIiOaGELyKSE0r4IiI5oYQv\nIpITNR3EXKSemdlo4FVgOfAEcC+hkrQTWOTuXWa2EFhKeIrsqjh+hEhdUg1fZHDLgN3x9XLgNnef\nDWwCFpvZmLjMXMJ4EFea2YSaRCqSgBK+yADMzAhPf11HeNT3bMJgP/D+oD+nAxvcfb+7HwKeAWbV\nIFyRRJTwRQa2AriK98d1aHH3rvhag/5IQ1IbvkgRM1sEPOfuW0JF/whHNehPPT7Oud5iGu7x7N3b\nkmp5SSnhixzpPOAUM7uAMJBPJ7DfzJrdvYP+g/4U1uinAs+XKjzr8QHKVY0xC8qRh3ja2w+kWl5S\nSvgiRdz9sr7XcUCfzcBMYAFwHzAfeBTYANxpZuMJY0DMJPTYEalLasMXGVpfM80NwOVm9hQwEbgn\nXqi9Dlgff2509/qpmooUUQ1fZAjuflPB23kDzF8DrKleRCKVUw1fRCQnlPBFRHIiUZNOvH38WqAL\nuB54Bd1mLiLSUErW8M3seEKSnwmcD1yMbjMXEWk4SWr4nwAec/f3gPeAJWb2JrAkzl8LXAO8RrzN\nHMDM+m4zX5d61CIiUrYkCf/DQIuZ/QiYANwEjNFt5iIijSVJwm8Cjgc+SUj+P6P/LeRHdZt5OZqb\nR5Z9i3M1btHOehsqX0TSkCThv014rkgP8KaZ7QO60rrNvBwdHYfLusW5GrdoZ70NlZ9sGyJSWpJu\nmeuBuWbWZGaTgLHA44TbzKH/beYzzGy8mY0lXOR9OoOYRUSkAiUTvrvvAP4N+D+EC7BXoNvMRUQa\nTqJ++O6+ClhVNFm3mYuINBDdaSsikhNK+CIiOaGELyKSE0r4IiI5oYQvIpITSvgiIjmhhC8ikhNK\n+CIiOaGELyKSE0r4IiI5oYQvIpITSvgiIjmhhC8ikhNK+CIiOaGELyKSE0r4IiI5oYQvIpITSvgi\nIjmhhC8ikhMlx7Q1s9nAg8CrQBOwEfgmcC/hhLETWOTuXWa2EFgKdAOr3H11VoGLiEh5ktbwn3T3\nue5+lrsvBZYDt7n7bGATsNjMxgDLgLnAWcCVZjYhk6hFRKRsSRN+U9H7OcDa+HotcDZwOrDB3fe7\n+yHgGWBWGkGKiMjRK9m
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8feba390>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(['Fare','Pclass'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the distribution is right sweked. We are going to detect outliers using a box plot"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fd941d0>"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEDCAYAAADKhpQUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAD9JJREFUeJzt3XuMXGd5x/Hv2psary/yhq6MMaUXIT0IVVUaJUpltzhx\nGyPSUqtxGqK4FsKlChJS3LQxoiihxrSo4qbKoP6TxDS1EqEgRRg3KDJR0og0ETaNwuWPPjRBTVvb\n4CUs7Tq+sOxO/5gxOl5mdmc3s549734/kpV3zjkz5/nD+c3j95zzzkCj0UCSVIZl/S5AktQ7hrok\nFcRQl6SCGOqSVBBDXZIKYqhLUkEGuzkoInYCe4EJ4CPAt4FDNL8UTgG7MnOiddweYBK4LzMPLkjV\nkqS2Bma7Tz0irgSeA34TWAPsB64A/jkzH42IvwX+i2bIPw9cA/wUOA78Tmb+eOHKlyRVddOp/x7w\n1cw8C5wF7oiI7wF3tPYfAe4Gvgscy8wzABHxDLAZeKznVUuS2uom1H8FWBURh4F1wEeBocycaO0/\nDWwA1gOjlfeNtrZLki6TbkJ9ALgS+COaAf9Ua1t1f6f3SZIuo25C/QfAs5k5BXwvIsaBiYhYkZkX\ngI3ACeAkl3bmG2nOxXf0059ONgYHl8+vcklaujo2zd2E+lHg8xHxCZod+2rgceAW4CFgR+v1MeD+\niFgLTAGbaN4J09HY2NluipckVYyMrOm4b9a7XwAi4s+A9wEN4GPAN2je7bICeBl4b2ZORsTNwAdp\nhvqBzPzCTJ87OjruEpGSNEcjI2s6dupdhfpCMdQlae5mCnWfKJWkghjqklQQQ12SCmKoS1JBDHVJ\nKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SC\nGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJemOXToIIcOHex3GdK8GOrSNE8//SRP\nP/1kv8uQ5sVQlyoOHTrI1NQUU1NTduuqJUNdqqh26HbrqqPB2Q6IiC3AF4HvAAPAt4BPAodofimc\nAnZl5kRE7AT2AJPAfZlpqyNJl1G3nfq/ZObWzLwhM/cA+4HPZuYW4CVgd0QMAfcCW4EbgLsiYt2C\nVC0tkC1btrYdS3XRbagPTHt9PXCkNT4C3AhcBxzLzDOZeR54BtjciyKly2XXrt0sW7aMZcuWsWvX\n7n6XI83ZrNMvLW+LiC8BV9Ls0ocyc6K17zSwAVgPjFbeM9raLtWKHbrqrJtQ/w9gX2Z+MSJ+DXhq\n2vumd/GzbZcWNTt01dmsoZ6ZJ2leKCUzvxcR3weuiYgVmXkB2AicAE5yaWe+EXhups8eHh5icHD5\nfGuXJE3Tzd0vtwMbMvPTEfEGmtMsnwduAR4CdgCPA8eA+yNiLTAFbKJ5J0xHY2NnX1v1krQEjYys\n6bhvoNFozPjmiFgNPAysA64A9gHfBP4JWAG8DLw3Mycj4mbggzRD/UBmfmGmzx4dHZ/55JKknzMy\nsqbj9Pasob6QDHVJmruZQt0nSiWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQl\nqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIK\nYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCjLYzUER8TrgO8B+4EngEM0vhFPA\nrsyciIidwB5gErgvMw8uTMmSpE667dTvBV5pjfcDn83MLcBLwO6IGGodsxW4AbgrItb1ulhJ0sxm\nDfWICOCtwGPAALAFONLafQS4EbgOOJaZZzLzPPAMsHlBKpYkddRNp/5p4C9oBjrAqsycaI1PAxuA\n9cBo5T2jre2SpMtoxjn1iNgFPJuZLzcb9p8z0G7jDNsvMTw8xODg8m4OlSR1YbYLpb8P/GpEvAvY\nCPwEOBMRKzLzQmvbCeAkl3bmG4HnZjv52NjZeRUtSUvZyMiajvtmDPXMvO3iOCI+AvwnsAm4BXgI\n2AE8DhwD7o+ItcBU65g9r7FuSdIczeU+9YtTKn8NvCcingaGgQdbF0c/BBxt/dmXmeM9rVSSNKuB\nRqPRt5OPjo737+SSVFMjI2s6Xrf0iVJJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtS\nQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXE\nUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVZHC2AyJiJfCPwHpgBfA3wDeBQzS/\nFE4BuzJzIiJ2AnuASeC+zDy4QHVLktroplN/F3A8M68H3g18BtgPfC4ztwAvAbsjYgi4F9gK3ADc\nFRHrFqRqSVJbs3bqmflI5eWbgf8GtgB3tLYdAe4Gvgscy8wzABHxDLAZeKyXBUuSOps11C+KiH8F\nNtLs3L+amROtXaeBDTSnZ0YrbxltbZckXSZdh3pmbo6I3wAeAgYquwY6vKXT9p8ZHh5icHB5tyVI\nkmbRzYXSq4HTmfk/mfmtiFgOjEfEisy8QLN7PwGc5NLOfCPw3EyfPTZ2dv6VS9ISNTKypuO+bi6U\nvh34S4CIWA+sBp4Abmnt3wE8DhwDromItRGxGtgEfG3+ZUuS5mqg0WjMeEBEvA54APgl4HXAPuDf\naN7SuAJ4GXhvZk5GxM3AB4Ep4EBmfmGmzx4dHZ/55JKknzMysqbj9Pasob6QDHVJmruZQt0nSiWp\nIIa6JBXEUJekghjqklQQQ12SCmKoS9Pcc89e7rlnb7/LkOal62UCpKXi5MkT/S5Bmjc7dami2qHb\nrauODHWpotql27Grjgx1SSqIoS5VvPGNG9uOpbpw7Rdpmt27bwfg4MGH+1yJ1J5rv0jSEmGoSxVH\nj36l7ViqC0Ndqjh8+NG2Y6kuDHVJKoihLlVs335z27FUF4a6VLFt202sXDnEypVDbNt2U7/LkebM\ntV+kaYaHh/tdgjRvhro0jcsDqM6cfpEqDhz4VNuxVBeGulTxwgvPtx1LdWGoS1JBDHWpYtWqVW3H\nUl0Y6lLFm9705rZjqS4Mdali+/YdbcdSXRjqUoULeqnuDHWpwrtfVHddPXwUEZ8AfhtYDvwdcBw4\nRPNL4RSwKzMnImInsAeYBO7LzIMLUrUkqa1ZO/WIuB54W2ZuAt4J/D2wH/hcZm4BXgJ2R8QQcC+w\nFbgBuCsi1i1U4dJCuOqqq9uOpbroZvrlaeCPW+MfA6uALcCXW9uOADcC1wHHMvNMZp4HngE297Zc\naWHdeefdbcdSXcw6/ZKZDeBc6+WfAo8B78jMida208AGYD0wWnnraGu7VCt26Kqzrhf0iojtwG5g\nG/BiZVenH0Dt+MOoFw0PDzE4uLzbEqTL4mMf+2i/S5DmrdsLpe8A/opmhz4eEeMRsSIzLwAbgRPA\nSS7tzDcCz830uWNjZ+dXtSQtYSMjazru6+ZC6VrgE8AfZOb/tjY/AVx8MmMH8DhwDLgmItZGxGpg\nE/C111C3JGmOuunU3w28HngkIgaABvAe4IGIuAN4GXgwMycj4kPAUWAK2JeZ4wtUtySpjYFGo9G3\nk4+Ojvfv5JJUUyMjazpes/SJUkkqiKEuSQUx1CWpIIa6JBXEUJemOXr0Ky67q9rq+olSaak4fPhR\nALZtu6nPlUhzZ6cuVRw9+hXOnTvLuXNn7dZVS4a6VHGxS58+lurCUJekghjqUsX27Te3HUt1YahL\nFdWLo14oVR0Z6lLF3r13th1LdWGoSxWvvPLDtmOpLgx1SSqIoS5JBTHUpYqVK4fajqW6MNSlCm9p\nVN0Z6lLFD37w/bZjqS4MdaniqaeeaDuW6sJQl6SCGOqSVBBDXaq46qqr246lujDUpYo777y77Viq\nC0Ndqjhw4FNtx1JdGOp
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fda1e48>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.boxplot(data=df['Fare'])"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fd53f28>"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEDCAYAAADKhpQUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEAlJREFUeJzt3X9s3Hd9x/GnE6fpnDiKC7eQeSKbYHujapqgqsqUUPJj\nA4QYMJZKTIuyrilStCEayrItm1SSRttUMVpBi6axtulCVTaxgdiybiWtSn+NDndCg/YP3oWwhZFk\nyW0yk42X4iTeH3cB+/CPs3vn8338fEjW9/y97933/Yf9urc/38/3456JiQkkSWVY0ekCJEmtY6hL\nUkEMdUkqiKEuSQUx1CWpIIa6JBWkt5mDImIX8LvAOPBh4HngQWofCmeA3Zk5Xj9uH3ARuDczj7Sl\naknStHrmmqceEVcBzwJvAPqBw8Aq4B8y83MR8cfAt6mF/FeAa4ELwHPA9Zn53faVL0marJlO/ZeA\nRzNzDBgD9kbEt4C99eePAfuBF4GhzBwFiIhngC3Awy2vWpI0rWZC/aeANRHxd8B64HagLzPH68+f\nAzYCG4DqpNdV6/slSYukmVDvAa4C3kMt4L9Y3zf5+ZleJ0laRM2E+lngS5l5CfhWRIwA4xGxOjNf\nAgaBU8Bppnbmg9TG4md04cLFid7elQurXJKWrxmb5mZC/TjwQER8hFrHvhZ4BLgBeAjYWf9+CLgv\nItYBl4DN1GbCzGh4eKyZ4qVFV6n0U62OdLoMaVqVSv+Mz805Tz0zTwN/C/wLtYue7wcOAjdGxJPA\nAHA0M88DB6h9CBwHDmWmvxWStIjmnNLYTtXqiOv+akmyU9dSVqn0zzj84h2lklQQQ12SCmKoS1JB\nDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQ\nl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBejtd\ngLTU7Lv7aUbGxunvW8XHb7m+0+VI8zJnqEfEVuBvgBeAHuBrwJ8CD1Lr9M8AuzNzPCJ2AfuAi8C9\nmXmkXYVL7TIyNj5lK3WTZodfnsjMHZm5PTP3AYeBezJzK3AC2BMRfcBtwA5gO3BrRKxvS9VSG/X3\nrZqylbpJs8MvPQ3fbwP21h8fA/YDLwJDmTkKEBHPAFuAh19+mdLi+fgt11Op9FOtjnS6FGnemg31\nqyPi88BV1Lr0vsy8/LfpOWAjsAGoTnpNtb5fkrRImhl++QZwKDN/BfhN4H6mfhg0dvFz7Zcktcmc\nnXpmnqZ2oZTM/FZE/BdwbUSszsyXgEHgFHCaqZ35IPDsbO89MNBHb+/KhdYutVWl0t/pEqR5a2b2\ny68DGzPzzoh4FbVhlgeAG4CHgJ3AI8AQcF9ErAMuAZupzYSZ0fDw2MurXmoTx9S1lM3WcPRMTEzM\n+uKIWAt8GlgPrAIOAV8FPgWsBk4CN2XmxYj4VeD3qIX63Zn517O9d7U6MvvJpQ4x1LWUVSr9Mw5v\nzxnq7WSoa6ky1LWUzRbqLhMgSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKoS1JBDHVJKoihLkkF\nMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBTHUJakghrokFcRQl6SCGOqSVBBD\nXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklSQ3mYOiogrgReAw8DjwIPUPhDOALsz\nczwidgH7gIvAvZl5pD0lS5Jm0mynfhvwP/XHh4F7MnMrcALYExF99WN2ANuBWyNifauLlSTNbs5Q\nj4gAXgc8DPQAW4Fj9aePAW8B3ggMZeZoZp4HngG2tKViSdKMmunU7wQ+RC3QAdZk5nj98TlgI7AB\nqE56TbW+X5K0iGYdU4+I3cCXMvNkrWH/ET3T7Zxl/xQDA3309q5s5lBp0VUq/Z0uQZq3uS6UvgP4\n6Yh4JzAIfB8YjYjVmflSfd8p4DRTO/NB4Nm5Tj48PLagoqV2q1T6qVZHOl2GNK3ZGo5ZQz0zf+3y\n44j4MPAfwGbgBuAhYCfwCDAE3BcR64BL9WP2vcy6JUnzNJ956peHVA4CN0bEk8AAcLR+cfQAcLz+\ndSgzbXMkaZH1TExMdOzk1epI504uzcLhFy1llUr/jNctvaNUkgpiqEtSQQx1SSqIoS5JBTHUJakg\nhrokFcRQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQUx1CWpIIa6JBXEUJekghjqklQQQ12SCmKo\nS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpIIY6pJUEENdkgpiqEtSQQx1SSqIoS5JBemd64CI+DHg\nL4ENwGrgj4CvAg9S+1A4A+zOzPGI2AXsAy4C92bmkTbVLUmaRjOd+juB5zJzG/Be4C7gMPCJzNwK\nnAD2REQfcBuwA9gO3BoR69tStSRpWnN26pn5mUnfvhr4T2ArsLe+7xiwH3gRGMrMUYCIeAbYAjzc\nyoIlSTObM9Qvi4h/Bgapde6PZuZ4/alzwEZqwzPVSS+p1vdLkhZJ06GemVsi4ueBh4CeSU/1zPCS\nmfb/wMBAH729K5stQVpUlUp/p0uQ5q2ZC6XXAOcy8zuZ+bWIWAmMRMTqzHyJWvd+CjjN1M58EHh2\ntvceHh5beOVSG1Uq/VSrI50uQ5rWbA1HMxdK3wz8DkBEbADWAo8BN9Sf3wk8AgwB10bEuohYC2wG\nnl542ZKk+Wom1P8c+PGIeIraRdHfAg4CN0bEk8AAcDQzzwMHgOP1r0OZaasjSYuoZ2JiomMnr1ZH\nOndyaRYOv2gpq1T6Z7xm6R2lklQQQ12SCmKoS1JBDHVJKoihLkkFMdQlqSCGuiQVxFCXpII0vaCX\ntFzsuePxHzw+cmBHByuR5s9OXZIKYqhLUkFc+0Wahmu/aClz7RdJWiYMdUkqiLNfpAa3PzDEybOj\nbNqwloM3XdfpcqR5sVOXGpw8OzplK3UTQ11qsGnD2ilbqZs4+0WahrNftJQ5+0WSlglDXZIKYqhL\nUkGc0ig12PvRJxi/cIlVvSv45P5tnS5Hmhc7danB+IVLU7ZSNzHUpQYreqZupW5iqEsNVqyo/Vqs\nXOmvh7qPP7VSgx3XDHJF7wq2v2Gw06VI8+bNR9I0vPlIS9lsNx85+0Vq4OwXdbOmQj0iPgK8CVgJ\n3AE8BzxIbfjmDLA7M8cjYhewD7gI3JuZR9pStdRGzn5RN5tzTD0itgFXZ+Zm4O3Ax4DDwCcycytw\nAtgTEX3AbcAOYDtwa0Ssb1fhUrus6l0xZSt1k2Y69SeBL9cffxdYA2wF9tb3HQP2Ay8CQ5k5ChAR\nzwBbgIdbWbDUbp/cv80xdXWtOUM9MyeA/6t/ezO1kH5bZo7X950DNgIbgOqkl1br+yVJi6TpC6UR\n8W5gD/BW4JuTnprpKuyct24MDPTR27uy2RKkRVWp9He6BGnemr1Q+jbgD6h16CMRMRIRqzPzJWAQ\nOAWcZmpnPgg8O9v7Dg+PLaxqqc0cftFSNlvD0cyF0nXAR4Bfzsz/re9+DNhZf7wTeAQYAq6NiHUR\nsRbYDDz9MuqWJM1TM536e4FXAJ+JiB5gArgRuD8i9gIngaOZeTEiDgDHgUvAocy01ZGkReQdpdI0\nHH7RUua/s5OkZcJQl6SCGOqSVBBDXZIKYqhLUkEMdUkqiKEuSQXxn2RIDW5/YIiTZ0fZtGEtB2+6\nrtPlSPNipy41OHl2dMpW6iaGutRg04a1U7ZSN3GZAGkaLhOgpcxlAiRpmfBCqdRgzx2P/+DxkQM7\nOliJNH926pJUEENdkgpiqEtSQQx1qYFTGtXNnNIoTcMpjVrKZpvS6OwXqcG+u59mZGyc/r5VfPyW\n6ztdjjQvDr9IDUbGxqdspW5iqEtSQQx1qcGq3hVTtlI38UKpNA0vlGopc+0XSVomnP0iNdj70ScY\nv3CJVb0r+OT+bZ0uR5o
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fd679b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We can see the same with matplotlib.\n",
"# There is a bug and if you import seaborn, you should add 'sym='k.' to show the outliers\n",
"df.boxplot(column='Fare', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since Fare depends on Pclass, we are going to show outliers per passenger class."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([('Fare',\n",
" <matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fdb73c8>)])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEdCAYAAADkeGc2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGbVJREFUeJzt3X2UXXV97/H3JAPBwNBEHSNGpC3aL9Jer6KCN6h5QEGu\nRazhYpFGarBifQJatKG3CLKs5WJl8VSqBSIPym1BqBJBjTTyaK4RXSq0+gXjNcrDhaFGTYjBSTL3\nj72HnIkzmZPMmTlz5vd+rZV1zuyzz97fczZ8zm//9m/v3TUwMIAkqSzT2l2AJGniGf6SVCDDX5IK\nZPhLUoEMf0kqkOEvSQXqbncB6nwRcQBwH3BvPWkG8MHM/HoLlv1p4IbMvHWE118DfD8znxjrulot\nIvYG7s/M32mYdgDwucx85S4u69PAy4EngOnAo8DJmfnkMPPu1jpUFlv+apUfZOaizFwELAM+PEHr\nXQrMmaB17aouYLgTaXb35Jpl9Xc8H/gP4NSdzOsJPNopW/5qla6G588FHgKIiLnAcmBPYCtwMvAC\nqj2DN0XEq4EzgfcANwAPAL8HrMnM9w0uMCK6gX8Cfrde1tlUAfdm4OCIWJyZDzXM+5l6PauB4zNz\n/4j4GnB//b6/Aa4CZlH9f/CBzPxORPRlZm+9nBuAS4CFwPPr5T23rn1lRPwR8JdAP3BvZn4wInqA\nG6n2fu4Z4bvaMyKuAQL4NtWP5ZrMjHq9bwMOycwzdvJ9fwP443r+DwGL6+/3TODHDd/b24D3A1uA\nf8/Md0fE/vX3s6X+7H9Szz5kWmb+dCfrV4ez5a9WiYhYFRGrgb+v/wGcC1yRmQuBfwQ+kpl3AU9E\nxOuAvwXeW8/7EuBDmXko8MqIeEnD8k8AfpWZC6iC7tLMvA34DvCng8FfewMwIzPnAauA/Rpeuy8z\nP0DVal5d76mcDlxYvz5Si/l5mXkUcCJwXt2l8zfAwvqzvSAi5lEF6X116/w7IyzrYKpW/GHAIcD+\nwHcj4lX168cC143w3kFvBNZExAuBt9TLWlLX12gmcFRmvgZ4cUT8PnAcsDIzj6i/h/1GmKYpzPBX\nqwx2+/w34Ejg+oiYDrwCuL2e52vAS+vnH6Jqya/MzB/X0x7IzEfq59+gahkPhvHTy8nMR4HNETG7\nfq1xrwPgxWxvdd9K1ZodtGaY5X0LOHCEZQ36t3re+4HnAb9PtSfwlXqP4oX13wcDg8c6bv/NxQDw\nYMPnvJdqT+ca4ISI2AP47cz89jDv+7v6B/Z2YDNwOfAyqu+KzFybme/a4T3rgZvr9xwEPAtYCZwU\nER8H9srMNfW0t+8wTVOY3T5quczMiNhE1aLdxvZA3bP+G+C3gF8BcxveOr3h+bSGeaH6EWgM5sFu\npOF07fBaY2v+1yMsr3Hdg/bYoZ5GTwHfysw3NE6su7G2jfCekQwAXwY+CiwCvjjCfGfueOA7IraO\ntJ76h+QfgP+SmX0RsQIgM/+93qs6EvhYRCzPzM9ExH/dcVqT9asD2fJXqzwdpBHxTKpug4eAb1IF\nGsACto8Iuoiqz3puRBxWT/vdiJgTEdOAw6gOag4ud83gcuo+622Z+UuqoG0MaYC1VC17qMJsuEbO\n03XV3S3319O3RcReETGTqlU96NX1vC8B1lEdmzgoIp5dTz8nIvYDEhgcZbOI4R3Y8DlfSTVaaQtw\nJ1U32WdHeN9wvgUcHhHT6mXe1PBaD9BfB//+VKOFZkTEW6l+EG4GzgJeERHH7zhtF2pQBzL81Sq/\nV3dJfI2q5freOtDOpupO+Dfg7cDZEXEc8NPMvA/4IHAxVcv7AeBjVN0md2fm99neav8XYHpErKLq\nDx/s3rgDuCEiXtxQyxeB34qIO4HDgf+spzfuAVwEvLyu62NsHznzj1TdKFey/YcK4JcR8QXgWuCv\nMvNXwGnAlyLiLuCZdXfUNcCrIuKrwIsY/hjCd6iOdXwduCczf9DwGbdl5o+Gec+wxyIyc11d013A\nTWw/dkFm/gy4LSK+QRXo5wMXAD8CLq0/+4frz/xgPe22hmmawrq8pLMmg1aOTa+PBSzMzJvq0UZf\nzcyDx7C8s4G+zLxsrLWNsp5zgP+bmVeP53oksM9fk0urWiIbgOMj4oNU3UanjTRjRGwDfsj2g8Jd\nwI8z8+gW1dKUiPgisAn4yESuV+Wy5a+i1QdMn1932UjFsOWv0nUxwvDOiAjgCqrhkd3AhzPzn+vX\ntgF/DZxENbzzxcBlVAe6NwNL6yGk0qTkAV9pZB8Hbq6PF5wMXFmfu/C0zBw80PyvwFX1WbrvBr5Q\nj+aRJiVb/hLcHhGNJ4LdlZmn1JefGAzwe4C92D6EFbaPxz8I6M3MqwAyc3VE9AHzgLvHvXppNxj+\nEswfrs8/Io4G/mc9ln/w4Fhja/5n9eMsYO+I+I/67y6qMfbPGqd6pTEz/KVh+vzri8NdDxyXmV+J\niD2pzkgeboTEI8AvxjKcVJpo9klKw9ub6qJogwdtT6O6pEPPjjPWJ1o9FBGLASLi2RFxXUQ8Y6KK\nlXaV4a/SjXTm7C+ozoj9TkR8i+oM2M8DX6wv/bDj+/4YeF9EfJ/qgm5frc8Clialpsb5R8SJVKfh\n91Od+n0f1Snl06juKLQkM/vr+U6luqjW5Zm5fLwKlyTtvlHDv75I12qqi1z1UF14ag/gi/Xp838L\n/ITqx+DbVBeE2kJ14azXZObPx698SdLuaOaA7+uodmE3UZ1+fkpE/Ag4pX59BXAG1UW51mTmRoCI\nuJvqolq3tLxqSdKYNBP+v001jO0LVEPaPgLMzMz++vXHqcY+zwH6Gt7Xh3cDkqRJqZnw7wKeCfwR\n1Q/B1xg6NG6kOx+NNF2S1GbNhP9jwNczcxvwo4jYAPRHxIzMfIrqTkwPU411bmzpz6U6VjCiLVu2\nDnR3D3cDJUlSiwzbEG8m/FcCn46I86n2APahuuXccVR3HFpc/70GuCIi9qW6u9I8tt8gY1jr129q\ntviO1tvbQ1/fhnaXoRZwW04dpWzL3t7fODUFaGKcf32j6c8B/4fq4O17qe7OdFJE3AHMBq7OzM3A\nMqofi5XAOZk59b9ZSepAbb2ef1/fhiJuJlBKC6MEbsupo5Rt2dvbM2y3j2f4SlKBDH9JKpDhL0kF\nMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDD\nX5IKZPhLUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKlB3uwuY6k69\n+C42bOqnZ+YeXPSB17S7HEkCmgj/iJgP3ADcD3QB3wM+DlxLtefwKLAkM/sj4kTgVGArcHlmLh+v\nwjvFhk39Qx4laTJottvn9sxclJkLM/NU4FzgksycD6wFlkbETOAsYBGwEDg9ImaNS9UdpGfmHkMe\nJWkyaLbbp2uHvxcAp9TPVwBnAA8AazJzI0BE3A0cDtwy9jI710UfeA29vT309W1odymS9LRmw//g\niPg88EyqVv/MzBzsx3gc2A+YA/Q1vKevni5JmmSa6fZ5EDgnM98M/ClwJUN/NHbcKxhtuiSpzUZt\n+WfmI1QHfMnMH0XE/wNeEREzMvMpYC7wMPAIQ1v6c4HVO1v27Nkz6e6evru1d5Te3p52l6AWcVtO\nHSVvy2ZG+7wN2C8zPxERz6Xq3vk0cBzwWWAx8GVgDXBFROwLbAPmUY38GdH69ZvGVn2HsM9/6nBb\nTh2lbMuRfuCa6fO/GbguIo4F9qA60Ptd4JqIeBewDrg6M7dGxDJgJVX4n5OZU/+blaQO1DUwMNC2\nlff1bWjfyidQKS2MErgtp45StmVvb8+wx1+9vIMkFcjwl6QCGf6SVCDDX5IKZPhLUoEMf0kqkOEv\nSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJU\nIMNfkgpk+EtSgQx/SSq
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fcdb0f0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Fare', by = 'Pclass', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that most outliers are in class 1. In particular, we see some values higher thatn 500 that should be an error."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 512.3292 NaN C \n",
"679 male 36.0 0 1 PC 17755 512.3292 B51 B53 B55 C \n",
"737 male 35.0 0 0 PC 17755 512.3292 B101 C "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df.Fare > 400]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can replace this value by the median(), the mean(), or the second highest value."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>512.3292</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>89</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Mabel Helen</td>\n",
" <td>female</td>\n",
" <td>23.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Charles Alexander</td>\n",
" <td>male</td>\n",
" <td>19.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>341</th>\n",
" <td>342</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Alice Elizabeth</td>\n",
" <td>female</td>\n",
" <td>24.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>438</th>\n",
" <td>439</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Mark</td>\n",
" <td>male</td>\n",
" <td>64.0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>19950</td>\n",
" <td>263.0000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>311</th>\n",
" <td>312</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ryerson, Miss. Emily Borie</td>\n",
" <td>female</td>\n",
" <td>18.0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PC 17608</td>\n",
" <td>262.3750</td>\n",
" <td>B57 B59 B63 B66</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"88 89 1 1 Fortune, Miss. Mabel Helen \n",
"27 28 0 1 Fortune, Mr. Charles Alexander \n",
"341 342 1 1 Fortune, Miss. Alice Elizabeth \n",
"438 439 0 1 Fortune, Mr. Mark \n",
"311 312 1 1 Ryerson, Miss. Emily Borie \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 512.3292 NaN C \n",
"737 male 35.0 0 0 PC 17755 512.3292 B101 C \n",
"679 male 36.0 0 1 PC 17755 512.3292 B51 B53 B55 C \n",
"88 female 23.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"27 male 19.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"341 female 24.0 3 2 19950 263.0000 C23 C25 C27 S \n",
"438 male 64.0 1 4 19950 263.0000 C23 C25 C27 S \n",
"311 female 18.0 2 2 PC 17608 262.3750 B57 B59 B63 B66 C "
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Calculate hight values\n",
"df.sort_values('Fare', ascending=False).head(8)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>258</th>\n",
" <td>259</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ward, Miss. Anna</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>NaN</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>89</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Mabel Helen</td>\n",
" <td>female</td>\n",
" <td>23.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Charles Alexander</td>\n",
" <td>male</td>\n",
" <td>19.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>341</th>\n",
" <td>342</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Fortune, Miss. Alice Elizabeth</td>\n",
" <td>female</td>\n",
" <td>24.0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>737</th>\n",
" <td>738</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Lesurer, Mr. Gustave J</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>B101</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>438</th>\n",
" <td>439</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Fortune, Mr. Mark</td>\n",
" <td>male</td>\n",
" <td>64.0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>19950</td>\n",
" <td>263.000</td>\n",
" <td>C23 C25 C27</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>679</th>\n",
" <td>680</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cardeza, Mr. Thomas Drake Martinez</td>\n",
" <td>male</td>\n",
" <td>36.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>PC 17755</td>\n",
" <td>263.000</td>\n",
" <td>B51 B53 B55</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>311</th>\n",
" <td>312</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ryerson, Miss. Emily Borie</td>\n",
" <td>female</td>\n",
" <td>18.0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PC 17608</td>\n",
" <td>262.375</td>\n",
" <td>B57 B59 B63 B66</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"258 259 1 1 Ward, Miss. Anna \n",
"88 89 1 1 Fortune, Miss. Mabel Helen \n",
"27 28 0 1 Fortune, Mr. Charles Alexander \n",
"341 342 1 1 Fortune, Miss. Alice Elizabeth \n",
"737 738 1 1 Lesurer, Mr. Gustave J \n",
"438 439 0 1 Fortune, Mr. Mark \n",
"679 680 1 1 Cardeza, Mr. Thomas Drake Martinez \n",
"311 312 1 1 Ryerson, Miss. Emily Borie \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"258 female 35.0 0 0 PC 17755 263.000 NaN C \n",
"88 female 23.0 3 2 19950 263.000 C23 C25 C27 S \n",
"27 male 19.0 3 2 19950 263.000 C23 C25 C27 S \n",
"341 female 24.0 3 2 19950 263.000 C23 C25 C27 S \n",
"737 male 35.0 0 0 PC 17755 263.000 B101 C \n",
"438 male 64.0 1 4 19950 263.000 C23 C25 C27 S \n",
"679 male 36.0 0 1 PC 17755 263.000 B51 B53 B55 C \n",
"311 female 18.0 2 2 PC 17608 262.375 B57 B59 B63 B66 C "
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Replace\n",
"df.loc[df.Fare > 400, 'Fare'] = 263.0\n",
"\n",
"# Check we have removed outliers\n",
"df.sort_values('Fare', ascending=False).head(8)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([('Fare',\n",
" <matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fcb91d0>)])"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEdCAYAAADkeGc2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZ9JREFUeJzt3X+cXHV97/HXJgvBQChRtwEj5t6i/Sj1ehUVvMGaHyjI\nbQGvoVqlSAlUrGLBChp6iwQfvd5cVB4ULJUC4ZeigkZKFCViQEV5GNRipZUPFBXlx4XlGjQBgkk2\n949zNpnE3exkZ2Znd7+v5+ORx8ycOXPms3Meec93vud7vqdny5YtSJLKMqXbBUiSxp7hL0kFMvwl\nqUCGvyQVyPCXpAIZ/pJUoN5uF6CJLyLmAD8CvlcvmgacmZnfacO2rwCuz8ybhnn+D4EfZ+bjrb5X\nu0XEnsDdmfmfG5bNAT6fma/exW1dAbwSeByYCjwCnJSZTw6x7qjeQ2Wx5a92uSczF2bmQmAJ8KEx\net/FwKwxeq9d1QMMdSLNaE+uWVJ/xvOAfwdO28m6nsCjnbLlr3bpabi/L/AgQETMBpYDuwObgZOA\nF1D9Mjg6Il4LnAW8G7geuBf4fWBNZp46uMGI6AX+Cfi9elvnUAXcm4ADI2JRZj7YsO6n6ve5A3hL\nZu4fEbcCd9ev+1vgSmAfqv8Hf5WZd0VEf2b21du5HrgIWAA8v97evnXtqyLifwDvBzYC38vMMyNi\nBvAFql8/3x7ms9o9Iq4GAvgB1ZflmsyM+n3fDhyUmWfs5PP+LvCn9fofABbVn+9ZwM8aPre3A+8F\nNgH/lpnvioj9689nU/23/1m9+nbLMvMXO3l/TXC2/NUuERGrI+IO4GP1P4APA5dl5gLgH4FzM/Nb\nwOMR8XrgfwHvqdd9GfCBzDwYeHVEvKxh+28Dns7M+VRB94nMvAW4C/jzweCvvRGYlplzgdXAfg3P\n/Sgz/4qq1XxH/UvlfcAF9fPDtZifl5lHAMcBy+ounb8FFtR/2wsiYi5VkP6obp3fNcy2DqRqxR8C\nHATsD/wwIl5TP38McO0wrx30R8CaiHgh8OZ6W8fX9TWaDhyRmX8IvCQi/gA4FliVmYfVn8N+wyzT\nJGb4q10Gu33+G3A4cF1ETAVeBdxWr3Mr8PL6/geoWvKrMvNn9bJ7M/Ph+v53qVrGg2G8dTuZ+Qiw\nISJm1s81/uoAeAnbWt03UbVmB60ZYnvfBw4YZluDvl6vezfwPOAPqH4J3Fz/onhh/fhAYPBYx22/\nvRkA7mv4O79H9UvnauBtEbEb8J8y8wdDvO5/11+wtwEbgEuBV1B9VmTm/Zn5zh1esxa4sX7Ni4Hn\nAKuAEyLio8AembmmXvaOHZZpErPbR22XmRkRT1G1aAfYFqi7148Bfgd4Gpjd8NKpDfenNKwL1ZdA\nYzAPdiMNpWeH5xpb878ZZnuN7z1otx3qafQM8P3MfGPjwroba2CY1wxnC/BV4O+AhcCXhlnvrB0P\nfEfE5uHep/4i+Qfgv2Rmf0SsBMjMf6t/VR0OfCQilmfmpyLiv+64rMn6NQHZ8le7bA3SiHg2VbfB\ng8CdVIEGMJ9tI4L+nqrPenZEHFIv+72ImBURU4BDqA5qDm53zeB26j7rgcz8NVXQNoY0wP1ULXuo\nwmyoRs7Wuurulrvr5QMRsUdETKdqVQ96bb3uy4AHqI5NvDginlsvXxoR+wEJDI6yWcjQDmj4O19N\nNVppE/BNqm6yTw/zuqF8Hzg0IqbU21zR8NwMYGMd/PtTjRaaFhFvpfpCuBE4G3hVRLxlx2W7UIMm\nIMNf7fL7dZfErVQt1/fUgXYOVXfC14F3AOdExLHALzLzR8CZwIVULe97gY9QdZvcnpk/Zlur/XPA\n1IhYTdUfPti98Q3g+oh4SUMtXwJ+JyK+CRwK/L96eeMvgL8HXlnX9RG2jZz5R6pulMvZ9kUF8OuI\n+GfgGuCDmfk0cDrwlYj4FvDsujvqauA1EfE14EUMfQzhLqpjHd8Bvp2Z9zT8jQOZ+ZMhXjPksYjM\nfKCu6VvACrYduyAzfwncEhHfpQr084DzgZ8An6j/9g/Vf/N99bJbGpZpEutxSmeNB+0cm14fC1iQ\nmSvq0UZfy8wDW9jeOUB/Zl7cam0jvM9S4KeZeVUn30cC+/w1vrSrJbIOeEtEnEnVbXT6cCtGxADw\nH2w7KNwD/Cwzj2xTLU2JiC8BTwHnjuX7qly2/FW0+oDp8+suG6kYtvxVuh6GGd4ZEQFcRjU8shf4\nUGZ+tn5uAPgb4ASq4Z0vAS6mOtC9AVhcDyGVxiUP+ErD+yhwY3284CTg8vrcha0yc/BA8xeBK+uz\ndN8F/HM9mkcal2z5S3BbRDSeCPatzDylnn5iMMC/DezBtiGssG08/ouBvsy8EiAz74iIfmAucHvH\nq5dGwfCXYN5Qff4RcSTwP+ux/IMHxxpb87+sb/cB9oyIf68f91CNsX9Oh+qVWmb4S0P0+deTw10H\nHJuZN0fE7lRnJA81QuJh4FetDCeVxpp9ktLQ9qSaFG3woO3pVFM6zNhxxfpEqwcjYhFARDw3Iq6N\niGeNVbHSrjL8Vbrhzpz9FdUZsXdFxPepzoC9AfhSPfXDjq/7U+DUiPgx1YRuX6vPApbGpRHH+det\nlyupLpgxjWryqR9SnVI+heqKQsdn5saIOI7qNPnNwKWZubxzpUuSRquZlv9RwJ31POpvpZob5MNU\n86nPo5pEa3HdGjqbajKrBcD7ImKfjlQtSWrJiAd8M/O6hocvAH4BzANOqZetBM6gmpRrTWauB4iI\n26km1fpyOwuWJLWu6dE+EfFtqrnXj6Lqz9xYP/UY1djnWUB/w0v68WpAkjQuNX3ANzMPBY6mmmu8\ncWjccFc+Gm65JKnLRmz5R8RBwGOZ+WBm/mt9evu6iJiWmc9Q/Rp4iGqsc2NLfzbVxbOHtWnT5i29\nvUNdQEmS1CZDNsSb6fZ5HTCH6gDuLGAv4CtUF3z+NNXFtL9KdaWlyyJib6qrK81l2wUyhrR27VPN\nFj+h9fXNoL9/XbfLUBu4LyePUvZlX99vnZoCNNft80ngd+urIq0E/pLq6kwnRMQ3gJnAVZm5AVhC\ndSHoVcDSzJz8n6wkTUBdnc+/v39dERcTKKWFUQL35eRRyr7s65sxZLePZ/hKUoEMf0kqkOEvSQUy\n/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUIMNf\nkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFai32wVMdouXrd56f/mShV2sRJK2seUvSQUy/CWpQIZ/\nh82Ztdd2t5I0Htjn32HnnHgwfX0z6O9f1+1SJGkrW/6SVKCmWv4RcR7wWmAqsAw4Gngl8Hi9ykcz\n8ysRcRxwGrAZuDQzl7e/ZElSq0YM/4iYDxyYmXMj4tnAvwBfB5Zk5k0N600HzgZeBWwC7oyIFZn5\nREcqlySNWjPdPt8A/qS+/wSwJ9UvgJ4d1jsEWJOZ6zNzA3A7cGi7CpUktc+ILf/M3AI8XT88Gfgy\nVbfOqRHx18CjwHuBfYH+hpf2A/u1tVpJUls0fcA3Io4BTgROBa4BPpiZhwF3AUuHeMmOvwwkSeNE\nswd8jwDOAo7IzHXArQ1PrwQuBq4HjmpYPhu4Y2fbnTlzOr29U3ep4Imqr29Gt0tQm7gvJ4+S92Uz\nB3z3Bs4DDsvMX9XLPg+cmZk/BeYDdwNrgMvq9QeAuVQjf4a1du1TLRU/EZx7xRoeeHQ9c2btxTkn\nHtztctQiz9mYPErZl8N9wTXT8n8r8BzguojoAbYAVwCfi4gngfXAiZm5ISKWAKuown9p/SuhaA88\nun67W0kaD5o54HspcOkQT10zxLorgBVtqGvSmDNrr60tf0kaL5zeocOc3kHSeOT0DpJUIMNfkgpk\n+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDDX5IK5PQOHXbKx25j46YBduudwiVnzO92\nOZIE2PLvuI2bBra7laT
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fcb4d30>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Fare', by='Pclass', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Embarked"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can analyze the distribution based on the port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton). "
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Embarked\n",
"C 168\n",
"Q 77\n",
"S 644\n",
"dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Embarked').size()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fbdad30>"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFI1JREFUeJzt3XuQnXV9x/H3uiuRbBKzwBpjEMTbl0FrZywWm1gCoUSt\nd4NajRnGqMVbG/FKq0BEtF4Ko+LYaiyKGTpTLxlrhpEGRJQAGrzfv1pUaBLaLHo0NxI3m+0fzy9w\nNm52T5Z9zlmy79dMhuf8nuc857t7hv08v9/vuXQNDw8jSdKDOl2AJGlqMBAkSYCBIEkqDARJEmAg\nSJIKA0GSBEBPnTuPiJXACmAY6AL+DHga8C/AfuAHmfn6su1bgXNK+yWZ+eU6a5MkjdTVrusQIuJ0\n4EXAE4C3ZOZ3IuJq4DNAAp8Dngr0ATcBp2SmF0lIUpu0c8joIuD9wKMy8zulbT1wNnAm8OXMHMrM\nu4FfA6e0sTZJmvbaEggRcSpwJzAENJpWbQPmA/OAgab2gdIuSWqTdvUQXgV8uix3NbV3/fGmY7ZL\nkmpS66RykzOAN5TlY5vaFwBbgK3AyQe1bx1rh/v2DQ339HRPYomSNC0c8oC79kCIiPnAjszcV17/\nNCIWZuYtwAuBjwC/AN4UERcBDwMekZk/GWu/jcbumiuXpCNPf//sQ65rRw9hPtVcwQHnAx+PiC7g\nm5l5A0BErKE6u2g/8Jo21CVJatK2004n28DAjgdm4ZLUQf39sw85ZOSVypIkwECQJBUGgiQJMBAk\nSYWBIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFQaCJAkwECRJRbueh9AxQ0NDbN58Z6fLmBaOP/4E\nurt9RoX0QHXEB8LmzXfyzg99nofMOqbTpRzR9uz8LZe+8RxOPPGkTpciaYKO+EAAeMisY5g5p7/T\nZUjSlOYcgiQJMBAkSYWBIEkCDARJUmEgSJIAA0GSVBgIkiTAQJAkFbVfmBYRy4G3AoPARcAPgbVU\nYXQXsCIzB8t2q4AhYE1mXll3bZKk+9TaQ4iIY6hCYCHwbOD5wCXAFZm5GLgdWBkRM4ELgSXAmcD5\nETG3ztokSSPV3UP4K+C6zNwN7AbOi4hfAueV9euBtwA/BzZl5k6AiNgILAKuqbk+SVJRdyA8CuiN\niP8E5gLvAmZm5mBZvw2YD8wDBpreN1DaJUltUncgdAHHAC+gCoevlrbm9Yd635j6+mbS0zP+rZa3\nb+8ddxtNjr6+Xvr7Z3e6DEkTVHcg/B9wS2buB34ZETuAwYiYkZl7gQXAFmArI3sEC4Bbx9pxo7G7\npQIajV0TqVsT0GjsYmBgR6fLkDSGsQ7a6j7tdAOwJCK6IuJYYBZwPXBOWb8MuBbYBJwaEXMiYhbV\nJPRNNdcmSWpSayBk5lbg88A3qCaIXw9cDJwbEV8D+oCrMnMPcAFVgGwAVmemh5qS1Ea1X4eQmWuA\nNQc1Lx1lu3XAurrrkSSNziuVJUmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCB\nIEkqDARJEmAgSJIKA0GSBBgIkqTCQJAkAQaCJKkwECRJgIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJA\nkCQB0FPnziNiMfA54EdAF/AD4IPAWqowugtYkZmDEbEcWAUMAWsy88o6a5MkjdSOHsKNmbkkM8/M\nzFXAJcAVmbkYuB1YGREzgQuBJcCZwPkRMbcNtUmSinYEQtdBr88A1pfl9cDZwGnApszcmZl7gI3A\nojbUJkkqah0yKk6JiC8Cx1D1DmZm5mBZtw2YD8wDBpreM1DaJUltUncg/AJYnZmfi4hHA1896DMP\n7j2M136vvr6Z9PR0j1vA9u29rdSpSdDX10t//+xOlyFpgmoNhMzcSjWpTGb+MiL+Fzg1ImZk5l5g\nAbAF2MrIHsEC4Nax9t1o7G6phkZj1wQq10Q0GrsYGNjR6TIkjWGsg7Za5xAi4mUR8eay/HCqoaFP\nAeeUTZYB1wKbqIJiTkTMAhYCN9VZmyRppLqHjL4E/HtEPA94MHAe8H3gMxHxt8AdwFWZORQRFwAb\ngP1Uw0weakpSG9U9ZLQTeO4oq5aOsu06YF2d9UiSDs0rlSVJgIEgSSoMBEkSYCBIkgoDQZIEGAiS\npMJAkCQBBoIkqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKkwkCQJAEGgiSpMBAkSYCBIEkqDARJ\nEmAgSJIKA0GSBBgIkqTCQJAkAdBT9wdExEOAHwGXADcAa6mC6C5gRWYORsRyYBUwBKzJzCvrrkuS\nNFI7eggXAr8py5cAV2TmYuB2YGVEzCzbLAHOBM6PiLltqEuS1KTWQIiIAE4GrgG6gMXA+rJ6PXA2\ncBqwKTN3ZuYeYCOwqM66JEl/rO4ewmXAm6jCAKA3MwfL8jZgPjAPGGh6z0BplyS1UW1zCBGxArgl\nM++oOgp/pGu0xjHaR+jrm0lPT/e4223f3tvK7jQJ+vp66e+f3ekyJE1QnZPKzwJOiojnAAuAPwA7\nI2JGZu4tbVuArYzsESwAbh1v543G7paKaDR2HWbZmqhGYxcDAzs6XYakMYx10FZbIGTm3xxYjoiL\ngF8DC4FzgKuBZcC1wCbgkxExB9hftllVV12SpNG16zqEA8NAFwPnRsTXgD7gqjKRfAGwofxbnZke\nZkpSm9V+HQJAZr6r6eXSUdavA9a1oxZJ0uha6iFExKdHafuvSa9GktQxY/YQyhXErwGeGBFfb1p1\nFNXpopKkI8SYgZCZV0fEjVSTwBc3rdoP/LjGuiRJbTbuHEJmbgHOiIiHAsdw3wTxXOC3NdYmSWqj\nliaVI+LDwEqqq4gPBMIw8Oia6pIktVmrZxktAfrLKaKSpCNQq9ch/MIwkKQjW6s9hM3lLKONwL4D\njZl5US1VSZLartVA+A3wlToLkSR1VquB8O5aq5AkdVyrgbCP6qyiA4aB3wPHTnpFkqSOaCkQMvPe\nyeeIOAo4C/jTuoqSJLXfYd/tNDP/kJlfpnr8pSTpCNHqhWkrD2p6JNWDbCRJR4hW5xD+sml5GNgO\nvHjyy5EkdUqrcwivAIiIY4DhzGzUWpUkqe1aHTJaCKwFZgNdEfEb4OWZ+a06i5MktU+rk8rvA56X\nmQ/LzH7gpcDl9ZUlSWq3VgNhKDN/dOBFZn6XpltYSJIe+FqdVN4fEcuA68rrZwBD9ZQkSeqEVgPh\nNcAVwCepnpb2PeDVdRUlSWq/VoeMlgJ7M7MvM4+lekjOX9dXliSp3VoNhJcDL2x6vRR42eSXI0nq\nlFaHjLozs3nOYJj7HqV5SBFxNPBpYB4wA7gU+D7VKawPAu4CVmTmYEQsB1ZRzU2sycwrW/0hJEn3\nX6uB8KWIuAW4ieoP+VnAF1p433OA2zLznyPiBKpJ6ZuBj2bmFyLiPcDKiFgLXAicSnX20m0RsS4z\nf3eYP48kaYJavVL50oi4ETiNqnfwusz8Rgvv+2zTyxOA/wEWA+eVtvXAW4CfA5sycydARGwEFgHX\ntPZjSJLur1Z7CGTmRqpHaB62iLiZ6mZ4zwGuy8zBsmobMJ9qSGmg6S0DpV2S1CYtB8L9kZmLIuJJ\nwNWMnHs41DzEuPMTfX0z6enpHvezt2/vbalG3X99fb3098/udBmSJqjWQIiIJwPbMnNzZv4gIrqB\nHRExIzP3UvUatgBbGdkjWADcOta+G43dLdXQaOyaUO06fI3GLgYGdnS6DEljGOug7bAfkHOYTgfe\nDBAR84BZwPXAOWX9MuBaYBNwakTMiYhZwEKqCWxJUpvUHQj/CjwsIr5ONYH8WuBi4NyI+BrQB1yV\nmXuAC4AN5d/qzPRQU5LaqNYho/KHfvkoq5aOsu06YF2d9UiSDq3uHoIk6QHCQJAkAQaCJKkwECRJ\ngIEgSSoMBEkSYCBIkgoDQZIEGAiSpMJAkCQBBoIkqTAQJEmAgSBJKgwESRJgIEiSCgNBkgQYCJKk\nwkCQJAEGgiSpMBAkSYC
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fb5d198>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Embarked', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since there are missing values, we will replace them by the most popular value ('S'), and we will also encode it since it is a categorical variable.\n",
"\n",
"We can see if this has impact on its survival."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Embarked\n",
"C 0.553571\n",
"Q 0.389610\n",
"S 0.336957\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Embarked']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fb017b8>"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGIxJREFUeJzt3X+UX3V95/HnJJNAfkEGGI0RgsqPd1xW9ETBGCAKEUqr\nFVG0dLWLsrjnKFbpuuqKukUFrOUQIe22CttI/bHqViw/FpMmFagYQiupWlTydtsKIQlbEhhIMklg\nMpn94/ud5DvjJHNnMvf7neE+H+fk5Hs/937vfU8uzOv+/Hza+vr6kCRV06RWFyBJah1DQJIqzBCQ\npAozBCSpwgwBSaowQ0CSKqy97A1ExFJgIbAXuCIzH6y3zwW+AfQBbcDLgI9n5rfKrkmSVFNqCETE\nYuDEzFwUEfOB5cAigMzcDJxdX24ycA9wR5n1SJIGKvty0BLgNoDMXA/MjoiZQyz3HuDWzNxZcj2S\npAZlh8AcYEvD9NZ622CXAX9Rci2SpEGafWO4bXBDRCwEHs7MHU2uRZIqr+wbw5sZeOQ/F3h80DJv\nBv62yMr27Onta2+fPEalSVJl/NoBeL+yQ2AVcBVwc0QsADZlZvegZU4DvllkZV1d3jKQpJHq7Jx1\nwHmlXg7KzLXAuohYA9wAXB4Rl0TEBQ2LzQGeKLMOSdLQ2iZSV9JbtmyfOMVK0jjR2TnrgJeDfGNY\nkirMEJCkCjMEJKnCDAFJqjBDQJIqzBCQpAozBCSpwgwBSaowQ0CSKswQkKQKMwQkqcIMAUmqMENA\nkirMEJCkCjMEJKnCDAFJqjBDQJIqzBCQxsDy5Tdx8cVvZfnym1pdijQihoB0iHbv3sXq1SsAWL16\nJbt372pxRVJxhoB0iHp6eugfq7uvby89PT0trkgqzhCQpAozBCSpwgwBSaqw9rI3EBFLgYXAXuCK\nzHywYd6xwDeBKcA/ZuYHyq5HkrRfqWcCEbEYODEzFwGXAcsGLXI9cF1mLgR666EgSWqSsi8HLQFu\nA8jM9cDsiJgJEBFtwJnAnfX5v5+ZG0uuR5LUoOwQmANsaZjeWm8D6AR2ADdExH0RcW3JtUiSBin9\nnsAgbYM+vxj4IrABuCsifjMzVxzoyx0d02lvn1xyidLITJ26d8D00UfP5MgjZ7WoGmlkyg6Bzew/\n8geYCzxe/7wVeCQzHwGIiO8DpwAHDIGurp3lVCkdgu3bdwyYfvLJHTz3nA/eafzo7DzwQUnZ/6Wu\nAi4CiIgFwKbM7AbIzF7gXyPihPqyrway5HokSQ1KPRPIzLURsS4i1gC9wOURcQnwdGbeDvwBcEv9\nJvFDmXlnmfVIkgYq/Z5AZl45qOmhhnn/ApxVdg2SpKF54VKSKswQkKQKMwQkqcIMAUmqMENAkirM\nEJCkCjMEJKnCDAFJqjBDQJIqrNm9iErD6u3tZePGDa0uo7Du7u4B0489toEZM2a0qJrijj12HpMn\n2ytv1RkCGnc2btzAVd+9hmkd4/8XKcDe53oHTC/74ZeYNHV8/3Ld1dXNVW/7JMcf/9JWl6IWMwQ0\nLk3rmMGMYyZGn/y9z+7h6Ybp6UfPZPJh/q+licF7ApJUYYaAJFWYISBJFWYISFKFGQKSVGGGgCRV\nmCEgSRVmCEhShRkCklRhhoAkVZghIEkVVnoHJxGxFFgI7AWuyMwHG+b9CthQn9cHvCszHy+7JklS\nTakhEBGLgRMzc1FEzAeWA4saFukDzs/MXWXWIUkaWtmXg5YAtwFk5npgdkTMbJjfVv8jSWqBskNg\nDrClYXprva3RlyLivoi4tuRaJEmDNLvT88FH/Z8GVgJPAbdHxNsy87sH+nJHx3Ta28f3YB06dNu2\nTYzBZCa6jo4ZdHZOjDEbVJ6yQ2AzA4/85wL7bvxm5tf7P0fE94BXAAcMga6unSWUqPGmq6t7+IV0\nyLq6utmyZXury2i55ctvYtWq73Heeb/FpZf+51aXU4qDhX3Zl4NWARcBRMQCYFNmdtenj4iIlREx\npb7s64GflVyPJO2ze/cuVq9eAcDq1SvZvbt6z6iUGgKZuRZYFxFrgBuAyyPikoi4IDO3AXcBD0TE\nfcATmXlrmfVIUqOenh76+voA6OvbS09PT4srar7S7wlk5pWDmh5qmPcnwJ+UXYMkaWi+MSxJFWYI\nSIeobVLDQ29tg6alcc4QkA7RpCmTmXnyUQDMPOkoJk3xMWZNHM1+T0B6Xuo4fS4dp89tdRnSiHkm\nIEkVZghIUoUZApJUYYXvCUTEC4Hj65OPZua/lVOSJKlZhg2BiHgn8AngRcBj9eZ5EbEJ+Hxm/lWJ\n9UmSSnTQEIiIW+rLvCczfzpo3iuBj0bEmzLzPaVVKEkqzXBnAn+dmbcPNaMeCu+OiAvGvixJUjMM\nFwKvqh/xDykzP3ugkJAkjX/DhUD//JPqf34ATKbW7fOPS6xLktQEBw2BzPw0QETcAZyemb316SnA\nt8svT5JUpqLvCcxj4NCQfex/XFSSNEEVfU/gLuCXEbEO2AssAG4rrSpJUlMUCoHM/GT9cdFXUDsj\n+Exm/qLMwiRJ5St0OSgiDgPOo3Zf4FZgVkQcXmplkqTSFb0n8GfACcDZ9ekFwC1lFCRJap6iITA/\nM/8LsBMgM/8csPN0SZrgiobAnvrffQARMQOYVkpFkqSmKRoCfxUR3wdeFhHLgJ8A3yivLElSMxR9\nOuhPI+LvgTcAzwIXZ+a6MguTJJWvUAhExAPAV4G/yMynRrKBiFgKLKT2fsEVmfngEMt8HliYmWcP\nnidJKk/Ry0EfAeYDP46I2yPiooiYOtyXImIxcGJmLgIuA5YNsczLgbOo32+QJDVPoRDIzDWZ+SHg\nJcAXgfOBTQW+uoT6m8WZuR6YHREzBy1zPXBl0YIlSWNnJMNLzgbeCrwDeBnw5QJfmwM0Xv7ZWm/7\n5/o6LwHuAR4tWockaewUvSfwN8Ap1I7qr8nM+0e5vX2d0EVEB/BeamcLxzGwg7ohdXRMp7198ig3\nrYli27YZrS6hEjo6ZtDZOavVZbTU1Kl7B0wfffRMjjyyWv8mRc8EbgRWZubeYZccaDO1I/9+c4HH\n65/PAY4B7gMOp/b46fWZ+ZEDrayra+cIN6+JqKuru9UlVEJXVzdbtmxvdRkttX37jgHTTz65g+ee\nK3qrdOI4WNgPN8bwjZn5YWoDzf+3iBgwPzMXD7PtVcBVwM0RsQDYlJnd9e/eCtxa387xwFcOFgCS\npLE33JnA8vrfnxrNyjNzbUSsi4g1QC9wef0+wNMOSyk9P/X29rJx44ZWl1FId/fAs87HHtvAjBkT\n43LkscfOY/LkQ788PtzIYj+tf/wCtfcEvjXS9wQyc/CTPw8Nscyj1C4PSZrgNm7cwN1/+EmOmTb+\ne5Z5du/AK9y/uPF6Dps0/i8Hbd21i3M+cw3HH//SQ15X0XsCHwF+h9p7Aj8BvgbckZnPHXIFkp53\njpk2jTnTx/8R9a7eXnima9/0C6ZNZ9oYHF1PJGW/JyBJGscKn/fU3xO4BPgocAbF3hNQQcuX38TF\nF7+V5ctvanUpkiqk6MhifwP8DHg1tfcEXp6Zo7pZrF+3e/cuVq9eAcDq1SvZvXtXiyuSVBVF7wn8\nHfBbmdlbZjFV1dPTQ19freukvr699PT0cPjh4/+mmqSJr+jloDcaAJL0/FP0TGBDRNwLPADseyIo\nM/97GUVJkpqjaAj8qv5HkvQ8UjQEPldqFZKkligaAnsYOOhLH/AMcPSYVyRJapqiYwzvu4FcH1Fs\nCfDKsoqSJDXHiDvJyMznMnMFcG4J9UiSmqjooDKXDmo6Dnjx2JcjSWqmovcEzmr43AdsA9459uVI\nkpqp6D2B9/Z/rvch9Exm9h3kKy01kfozB/s0l9Q6w40sdirw6cx8R336G8CFwDMRcUFm/kMTahyx\njRs38KkbvsPhM49qdSmF9PUO7JF76dfvo23y1BZVU9zuHU9x9RUXjUmf5pJaY7gzgWXA9QARsRh4\nHfBCavcElgFvLLW6Q3D
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fac9358>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x='Embarked', y='Survived', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems passengers embarked in C (Cherbourg) have a higher chance of survival.\n",
"We can analyse this by sex."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8faf1550>"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHjFJREFUeJzt3XucVXW9//HXXAUGkBmYMEBULn4w0/zhkYDwSuIlSzua\nmmUaaf28hifTRI95g7z8HC5HLcGDaZadR1amRyHG0lTUc5TMOBaf0hQYoB8DjDAzMMxtnz/2Htwz\nzWUBa+09e9b7+Xj4YK+19vruD6xx3nut71rfb14ikUBEROIpP9sFiIhI9igERERiTCEgIhJjCgER\nkRhTCIiIxJhCQEQkxgqj/gAz+zjwJFDh7g902PZpYA7QDCx19zuirkdERD4U6ZmAmQ0AFgLPdfGW\nBcDngWnADDObEGU9IiLSXtSXgxqA04CNHTeY2SHAFnff4O4J4FlgesT1iIhImkhDwN1b3X1XF5sP\nAKrTljcBH42yHhERaa83dQznZbsAEZG4ibxjuBsbaP/Nf2RqXZeam1sShYUFkRYlItIHdfklO5Mh\n0K4Id19jZoPMbDTJX/5nABd010BNzY4IyxMR6ZvKywd1uS3SEDCzicC9wEFAk5mdDTwFvOfuvwIu\nA34KJIDH3f2dKOsREZH28nJpKOnq6trcKVZEpJcoLx/U5eWg3tQxLCIiGaYQEBGJMYWAiEiMKQRE\nRGJMISAiEmMKARGRGMvmE8M5pbW1lXnz7qGmZgsFBYXU1tZy5ZWzGDNmbLZLExHZawqBgN59969s\n2vR37rprHgBVVeuoqlrH8uVLqa7eRHNzE2eddQ5jx47nhhu+RUXFfbz99iqWLXuG2bO/m+XqJVct\nWbKI5cufZcaM05k58+vZLkf6ID0sFlBzczO33noTAwYM4KijJnLkkUdRW1vLT37yKLfd9j127Wrg\nqqv+L4sW/ZDXX3+NV15ZwZo173HHHXczYMCAbJUtOayhYSdf/eoFJBIJ8vLyefjhH9OvX/9slyU5\nqLuHxXQmEFBhYSG3334n27dv4+23/4clSx7EfTXFxcXMnXsrAAUFycHtjjlmMg8++AAnnHCSAkD2\nWlNTE21f0hKJVpqamhQCEjqFQEBvvrmS7du3cfzxJzFlyqcYN248X/jC5zjllNO54YabAViz5n0A\nli9fyrHHHs/Kla9zyimnU17+kSxWLiLSNYVAQOPHGxUVd7F06X9SXLwfDQ07mTfvfl577RXmzLmF\nuro6PvnJKQwYMIBnnnmaefPuY9q047n77jncc8+CbJcvItIp9QmI9FK1tdu59NKv7F5evPhRBg0a\nnMWKJFdpADkREemUQkBEJMYUAiIiMaYQEBGJMYWAiEiMKQRERGKsTz0n0NLSQlXV2lDbHDVq9O4n\ngcM0d+6tnHjidKZMmRZ62yLSO+TC2E99KgSqqtZy0/wn6DewLJT2Guq2csesczjooENCaU9E4qOh\nYSeVlUsBqKxcxgUXXNgrh/3oUyEA0G9gGQMGl2f0M5cu/U/efHMl27Z9wPvvv8ell17Gc8/9mvff\nf5+bb76N3/ymktWr/0Rj4y7OPPNszjjjzN37tra2cvfdc9i4cQPNzc187WvfYOLEf8po/SISvlwZ\n+6nPhUC2rF9fxf33L+bpp5/kscce4eGHf8wzzzzFs88+zSGHjOGqq65h165dnHfeWe1CoLJyGcOG\nlfOd7/wr27Z9wNVXX8Yjjzyexb+J7I0oLkXW19e3W163bi0lJSWhtR/VpU7JLQqBkEyYcBgAQ4cO\nY+zYceTl5VFWNpTGxka2bdvGZZfNpLCwiG3bPmi336pVf2TVqj/wxz/+gUQiQVNTI83NzRQW6tDk\nkqqqtdzyizn0Lw3vl3RrY0u75YUv/4D84nB+ae+sqeeWf75RlzpFIRCW9G9U6a///veNbNiwnvvv\nf4j8/HxmzDi+3X5FRUV85SszmT59RsZqlWj0Ly2hZNig0Npr2dVM+leGAUMHUrCf/peVcOkW0Yit\nXv1nhg8fTn5+Pi+//DtaW1tobm7evf1jHzucF198AYCamq08+OD9WapUROKoz32taKjb2qvaOuaY\nSaxbt46rrvoGxx57AlOnHsu99965e/tJJ53M73//BpddNpPW1kSvvY1MRPqmPhUCo0aN5o5Z54Te\nZk9OO+2M3a+nTp3G1KnT/uF1m3PP/eI/7H/99TftY5UiInunT4VAQUGBOrpERPaA+gRERGJMISAi\nEmMKARGRGFMIiIjEmEJARCTG+tTdQdkYSrq5uZnLL7+Egw8+hNmzvxvKZ/797xu56abreeihR0Np\nT0SkK30qBMIevyXI+CqbN2+mubkptABok5cXanMiIp2KPATMrAKYDLQCs9z9jbRtVwBfApqBN9z9\nX/b188Iev6Un991Xwfr1Vcydeys7duygrq6WlpYWrrnm24wZM47zzjuLz372LF544beMHDkKs8N4\n/vnnOPDA0dx88+28885fqai4i6KiIvLy8rj99rvatf/WW2+yaNEDFBYWMXz4cK677kYNLicioYm0\nT8DMjgPGuftU4BJgYdq2QcC1wKfc/TjgcDObFGU9Ubjyyms48MCDGDlyFJMnT2X+/Af41re+w7/9\n2zwgOV/AhAkf46GHHmXVqrcYOXIkixc/wltvvUl9fR01NVu55prrWLDg+xxxxCdYvnxpu/YXLPh/\n3HlnBQsWPMCQIaU8//xz2fhrikgfFfVXyunAkwDuvtrMhpjZQHevAxqBXcBgM6sH+gPhDfyTYatW\nvcW2bR/w618/C0BjY+PubYcd9jEAysqGMm7coanXZdTV1VFWNpTvf38hDQ0NbNmymRkzTtu9X03N\nVtatW8eNN36bRCJBQ0MDQ4aUZvBvJSJ9XdQhcADwRtry5tS6d9x9l5ndBvwN2AH81N3fibieyBQV\nFTNr1nUcfvjH/2FbQUFhp68TieQ3/QsvvJhjjpnM448/RkPDzt3bCwuLKC8vZ+HCH0RbvIjEVqYv\nLu/u7kxdDpoNjANqgefN7Ah3X9XVzqWlAygs7PpOne3bw5vQ48PPLKG8vOs+hsbG7RQW5jNp0tG8\n8cYKTjhhCu+88w4vv/wyF198Mfn5eQwbNpD+/ftTWJjP0KHJ9goK8ikrG0B9fS0f/7ix//77sXLl\naxx11FGUlZVQWFjAmDEjKCwsYPv2TYwdO5bHHnuMSZMmceihh4b+95R9E8XPXtR6+tmWfVNc3Npu\neejQgey/f+/79446BDaQ/ObfZgSwMfX6MOBdd68BMLOXgKOBLkOgpmZHtx9WU1PPzpr6bt+zJ3bW\n1FNTU091dW2X79m6tZ6WllZOPfUs5sz5Lueeez6tra3MmvVtqqtraW2FzZvr6NevmZaWVrZsqaeo\nqJaWlla2bt3BmWeew9e//g1GjTqQM888h3nz7mHKlBNobm6hurqWa6+dzbXXXkdxcTFDhw5j+vTP\ndFuPZEdNiD93bfLy024Ry+uwHIKefrZl39TW1rVb3rKljsbG7Dya1V3Y57VNhBwFM5sC3OLup5jZ\nRGB+qhMYM/sI8DJwROrS0HLgVndf0VV71dW13RabjecERADWrHmPu56fH/qdaTX/vYG6v2xl4KFl\nlE4aEVq79Ztruf7EWRp1N0K1tdu59NKv7F5evPhRBg0anJVayssHdfkNItIzAXd/1cxWmtkKoAW4\nwswuAj5w91+Z2T3AC2bWBLzSXQAEoaGkpa8pnTQi1F/+Ih1F3ifg7rM7rFqVtm0xsDjqGkREpHMa\nO0hEJMYUAiIiMaYQEBGJMYWAiEiMKQRERGJMw1GKSKxF8XwRQH19+wcI161bS0lJeE+Wh/UMk0Kg\nj1uyZBHLlz/LjBmnM3Pm17NdjkivE/Y8JG1aG1vaLS98+QfkF4fz4GmQuU6CUgj0YQ0NO6msTA5N\nXVm5jAsuuJB+/fpnuSqR3ieKeUhadjXzQdrygKEDKdiv9/3KVZ9AH9bU1ETbsCCJRCtNTU1ZrkhE\nehuFgIhIjCkERERiTCE
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fabd320>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Embarked\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also an improvement by gender for passengers embarking in Cherbourg."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have to fill null values (2 null values) and encode this variable, since it is categorical. We will do it after reviewing the rest of features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Features SibSp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We analyse the distribution."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp\n",
"0 608\n",
"1 209\n",
"2 28\n",
"3 16\n",
"4 18\n",
"5 5\n",
"8 7\n",
"dtype: int64"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('SibSp').size()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8fa57588>"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFbdJREFUeJzt3X+UX3V95/HnNCPBTBIziUMMyZJat/u21O3uupzFDSoQ\nJZSqy9ZgdcWsNbobtnoOpdJz6J7lR1ntuqzQrdrWntRU5OBZYY3WSIWIFgQJDa6t6LZ9i2BJJ8HN\nECZMSEpIJrN/3E/Cd4bM5Dth7vfOJM/HOTncn995DSf5vu7n3u+9366RkREkSfqppgNIkqYHC0GS\nBFgIkqTCQpAkARaCJKmwECRJAHTX+eIRsRZYA4wAXcC/BF4P/CFwCHg4Mz9Ytv1N4JKy/PrM/Fqd\n2SRJo3V16j6EiHgj8A7g54ErM/O7EXEr8DkggduB1wG9wH3AmZnpTRKS1CGdPGV0DfDfgZ/OzO+W\nZZuAC4Dzga9l5nBmPgn8HXBmB7NJ0kmvI4UQEWcB24BhYLBl1U5gCbAYGGhZPlCWS5I6pFMjhA8A\nny3TXS3Lu1646YTLJUk1qfWicovzgA+V6UUty5cC24EdwKvHLN8x0QsePDg80t09awojStJJYdwD\n7toLISKWAHsy82CZ/5uIWJGZDwBvBz4BPAL8RkRcA5wGnJ6Zfz3R6w4O7qs5uSSdePr65o27rhMj\nhCVU1woOuwL4o4joAv4iM78JEBHrqT5ddAi4rAO5JEktOvax06k2MLBnZgaXpAb19c0b95SRdypL\nkgALQZJUWAiSJMBCkCQVFoIkCbAQJEmFhSBJAiwESVJhIUiSAAtBklRYCJIkwEKQJBUWgiQJsBAk\nSYWFIEkCLARJUmEhSJIAC0GSVFgIkiTAQpAkFRaCJAmwECRJhYUgSQKgu+4fEBGXAr8JHACuAb4P\n3EJVRk8AazLzQNnucmAYWJ+ZG+rOJkl6XtfIyEhtLx4RC4EtwL8A5gHXAy8BvpqZGyPio8A2qoL4\nLnAWcBB4CHhDZu4e77UHBvaMCj48PEx//7Zafo8Xa9myM5g1a1bTMSSJvr55XeOtq3uE8Gbg65m5\nD9gHrIuIx4B1Zf0m4Ergh8DWzHwGICLuB84B7mj3B/X3b2P7/7qdpQsWTmX+F2377qfgXe9g+fJX\nNh1FkiZUdyH8NNATEX8KLAB+G5iTmQfK+p3AEmAxMNCy30BZPilLFyxk+aK+FxVYkk5WdRdCF7AQ\n+GWqcvjzsqx1/Xj7Tai3dw7d3c+fhhka6mHwuGPWq7e3h76+eU3HkKQJ1V0I/w94IDMPAY9FxB7g\nQETMzsz9wFJgO7CD0SOCpVTXHsY1OLhvzPzeqcw9pQYH9zIwsKfpGJI04cFp3R873QysjIiuiFgE\nzAXuBi4p61cDdwJbgbMiYn5EzAVWAPfVnE2S1KLWQsjMHcD/Bh6kukD8QeBa4L0RcS/QC9ycmc8C\nV1EVyGbgusz0kFqSOqj2+xAycz2wfsziVUfZbiOwse48kqSj805lSRJgIUiSCgtBkgRYCJKkwkKQ\nJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSosBEkSYCFI\nkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEkAdNf54hFxLnA78AOgC3gY+B/ALVRl9ASwJjMPRMSlwOXA\nMLA+MzfUmU2SNFonRgj3ZObKzDw/My8Hrgc+mZnnAo8CayNiDnA1sBI4H7giIhZ0IJskqehEIXSN\nmT8P2FSmNwEXAGcDWzPzmcx8FrgfOKcD2SRJRa2njIozI+LLwEKq0cGczDxQ1u0ElgCLgYGWfQbK\ncklSh9RdCI8A12Xm7RHxM8Cfj/mZY0cPx1p+RG/vHLq7Zx2ZHxrqYfDFJK1Rb28PfX3zmo4hSROq\ntRAycwfVRWUy87GI+AlwVkTMzsz9wFJgO7CD0SOCpcCWiV57cHDfmPm9U5h8ag0O7mVgYE/TMSRp\nwoPTWq8hRMS7I+LDZfoVVKeG/gS4pGyyGrgT2EpVFPMjYi6wArivzmySpNHqPmX0FeDzEXEx8BJg\nHfA94HMR8R+Bx4GbM3M4Iq4CNgOHqE4zeUgtSR1U9ymjZ4B/c5RVq46y7UZgY515JEnj805lSRJg\nIUiSCgtBkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEkAdNf9AyLiVOAHwPXAN4FbqIro\nCWBNZh6IiEuBy4FhYH1mbqg7lyRptE6MEK4GdpXp64FPZua5wKPA2oiYU7ZZCZwPXBERCzqQS5LU\notZCiIgAXg3cAXQB5wKbyupNwAXA2cDWzHwmM58F7gfOqTOXJOmF6h4h3Aj8BlUZAPRk5oEyvRNY\nAiwGBlr2GSjLJUkdVNs1hIhYAzyQmY9XA4UX6DrawgmWj9LbO4fu7llH5oeGehicdMrO6O3toa9v\nXtMxJGlCdV5Ufgvwyoh4G7AUeA54JiJmZ+b+smw7sIPRI4KlwJZjvfjg4L4x83unKPbUGxzcy8DA\nnqZjSNKEB6e1FUJmvuvwdERcA/wdsAK4BLgVWA3cCWwF/jgi5gOHyjaX15VLknR0nboP4fBpoGuB\n90bEvUAvcHO5kHwVsLn8uS4zPZyWpA6r/T4EgMz87ZbZVUdZvxHY2IkskqSja2uEEBGfPcqyu6Y8\njSSpMROOEModxJcBr4mIb7WsOoXq46KSpBPEhIWQmbdGxD1UF4GvbVl1CPi/NeaSJHXYMa8hZOZ2\n4LyIeBmwkOcvEC8AnqoxmySpg9q6qBwRvwespbqL+HAhjAA/U1MuSVKHtfspo5VAX/mIqCTpBNTu\nfQiPWAaSdGJrd4TQXz5ldD9w8PDCzLymllSSpI5rtxB2Ad+oM4gkqVntFsJ/rTWFJKlx7RbCQapP\nFR02AjwNLJryRJKkRrRVCJl55OJzRJwCvAn4Z3WFkiR13qSfdpqZz2Xm16i+/lKSdIJo98a0tWMW\n/SOqL7KRJJ0g2r2G8IaW6RFgCPiVqY8jSWpKu9cQ3gcQEQuBkcycrl9fLEk6Tu2eMloB3ALMA7oi\nYhfwnsz8Tp3hJEmd0+5F5Y8BF2fmaZnZB/w74Kb6YkmSOq3dQhjOzB8cnsnMv6TlERaSpJmv3YvK\nhyJiNfD1Mv+LwHA9kSRJTWi3EC4DPgn8MdW3pf0V8B/qCiVJ6rx2TxmtAvZnZm9mLqL6kpxfqi+W\nJKnT2i2E9wBvb5lfBbx76uNIkprS7imjWZnZes1ghOe/SnNcEfFS4LPAYmA28BHge1QfYf0p4Alg\nTWYeiIhLgcuprk2sz8wN7f4SkqQXr91C+EpEPADcR/VG/ibgi23s9zbgocz8eEScQXVR+tvApzLz\nixHxUWBtRNwCXA2cRfXppYciYmNm7p7k7yNJOk7t3qn8kYi4BzibanTwa5n5YBv73dYyewbw98C5\nwLqybBNwJfBDYGtmPgMQEfcD5wB3tPdrSJJerHZHCGTm/VRfoTlpEfFtqofhvQ34emYeKKt2Akuo\nTikNtOwyUJZLkjqk7UJ4MTLznIj4BeBWRl97GO86xDGvT/T2zqG7e9aR+aGhHqbrA5Z6e3vo65vX\ndAxJmlCthRARrwV2ZmZ/Zj4cEbOAPRExOzP3U40atgM7GD0iWApsmei1Bwf3jZnfO6XZp9Lg4F4G\nBvY0HUOSJjw4nfQX5EzSG4EPA0TEYmAucDdwSVm/GrgT2AqcFRHzI2IusILqArYkqUPqLoRPA6dF\nxLeoLiD/J+Ba4L0RcS/QC9ycmc8CVwGby5/rMtNDaknqoFpPGZU3+kuPsmrVUbbdCGysM48kaXx1\njxAkSTOEhSBJAiwESVJhIUiSAAtBklRYCJIkwEKQJBUWgiQJsBAkSYWFIEkCLARJUmEhSJIAC0GS\nVFgIkiTAQpAkFRaCJAm
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8fa64278>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('SibSp', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that most passengers traveled without siblings or spouses. \n",
"\n",
"We analyse if this had impact on its survival."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp\n",
"0 0.345395\n",
"1 0.535885\n",
"2 0.464286\n",
"3 0.250000\n",
"4 0.166667\n",
"5 0.000000\n",
"8 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('SibSp').Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f9e30f0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f91a160>], dtype=object)"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEJCAYAAACUk1DVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGLlJREFUeJzt3X+Q3PV93/HnIRCyhM5gchayNMhMbb89LkknmCmthBE/\nbAjDr7RAnYkGx+BMiG08CuCkcmNAwW6aGEMdHBMnwsKYQgdIFWJZgGUSFxebVqSOIXTG7xqZCuuH\n0YEFJ8mcJE7XP77fE3fnk25377u339M9HzM32u+v9753tbev+/747HYNDg4iSdIRnW5AklQPBoIk\nCTAQJEklA0GSBBgIkqSSgSBJAuDITjegQ4uIs4FbgDnAJuDKzNza2a6kzoiII4E/Ba4FFvq7UC33\nEGosImYD/xW4KjPfDXwD+MvOdiV11N8CfYADqNrAQKi3s4GNmfl0Ob0aODci5nSwJ6mTbs7MPwK6\nOt3I4chAqLd3ARuHJjJzN/Ay8I6OdSR1UGb+r073cDgzEOptNtA/at5rFOcTJKlSBkK97QZmjZo3\nG9jVgV4kHeYMhHr7IfDOoYmIeDNwLPCjjnUk6bBlINTbt4ETI2JxOX0t8I3MfK2DPUk6THX58df1\nFhFnALdTHCp6DvhwZm7vbFfS5IuItwKPl5NDF1y8DpyTmds61thhpKFAiIhZwLPAzcDfA/dQ7F1s\nA67IzH0RsQxYDgwAqzJzddu6liRVrtFDRjdQXO4IRSh8MTOXUiT0VeUAqhsorps/C7g2Io6tullJ\nUvuMGwgREcC7gXUUg0GWAmvLxWuBDwCnARsyc1dm9gNPAEva0rEkqS0a2UO4FbiON0YGzsnMfeXt\n7cB8YB7QO2yb3nK+JGmKOGQgRMQVwPcyc9NBVjnY8HGHlUvSFDPep51eAJwUERcBC4C9wK6IODoz\n95TztgBbGblHsAB4crw7//q69YOtfkTV+04/jeOOfXNrG+twN+X+IBkcHBzs6ppybWtqaPiFdchA\nyMzfGLodETcC/w9YDFwG3AtcCjwKbADujIhuYH+5zvLx7vwv/uZZjjjmxEZ7PWBgXz/9e77HWWec\nfmBeT89cent3Nl1rLFXVqmNP06FWT8/cCrqZXF1dXbV7HqdDrTr21I5ajWrm+xCGUuYm4J6I+B2K\nz+e/OzMHImIFsJ4iEFZm5riP5ogjjuSIGUc10UJhcP9A09tIkg6t4UAoP3J2yLljLF8DrKmiKUnS\n5POjKyRJgIEgSSoZCJIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUDQZIEGAiSpJKBIEkCDARJUslA\nkCQBBoIkqWQgSJIAA0GSVDIQJEmAgSBJKo37FZoR8Sbgq8A84Gjgs8BlwHuBl8rVbsnMRyJiGbAc\nGABWZebqdjQtSapeI9+pfBHwVGZ+PiJOBL4FfBdYkZkPD60UEbOBG4BTgdeBpyJiTWa+0oa+JUkV\nGzcQMvOBYZMnAj8pb3eNWvU0YENm7gKIiCeAJcC6CvqUJLVZI3sIAETEd4EFwIXA9cDHI+I64EXg\nE8AJQO+wTXqB+dW1Kklqp4ZPKmfmEuBi4F7gaxSHjM4BfgCsHGOT0XsQkqQaa+Sk8inA9szcnJnP\nRMSRwD9l5tAJ5bXAHcCDFOcbhiwAnqy64SHd3bPo6Zk7Yt7o6YmoqlYde5outaaauj6Ph3utOvZU\nda1GNXLI6AxgEXBtRMwDjgH+MiI+mZnPA2cCzwIbgDsjohvYDyymuOKoLfr6+unt3Xlguqdn7ojp\niaiqVh17mg61pmqo1O15nA616thTO2o1qpFA+DLwlYj4DjAL+BiwC7g/InaXt6/MzP6IWAGspwiE\nlZlZzSOSJLVdI1cZ9QPLxlj0L8dYdw2wpoK+JEmTzJHKkiTAQJAklQwESRJgIEiSSgaCJAkwECRJ\nJQNBkgQYCJKkkoEgSQIMBElSyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVxv0K\nzYh4E/BVYB5wNPBZ4GngHopA2QZckZn7ImIZsBwYAFZl5uo29S1JqlgjewgXAU9l5pnAB4HbgJuB\nP8/MpcBG4KqImA3cAJwNnAVcGxHHtqVrSVLlxt1DyMwHhk2eCPwEWApcXc5bC3wS+L/AhszcBRAR\nTwBLgHVVNixJao9xA2FIRHwXWECxx/CtzNxXLtoOzKc4pNQ7bJPecr4kaQpo+KRyZi4BLgbuBbqG\nLeoae4uDzpck1VAjJ5VPAbZn5ubMfCYiZgA7I+LozNxDsdewBdjKyD2CBcCT7WgaoLt7Fj09c0fM\nGz09EVXVqmNP06XWVFPX5/Fwr1XHnqqu1ahGDhmdASyiOEk8DzgGeAS4jGJv4VLgUWADcGdEdAP7\ngcUUVxy1RV9fP729Ow9M9/TMHTE9EVXVqmNP06HWVA2Vuj2P06FWHXtqR61GNXLI6MvAWyPiOxQn\nkD8K3AT8VkQ8DhwH3J2Z/cAKYH35szIzq3lEkqS2a+Qqo35g2RiLzh1j3TXAmgr6kiRNMkcqS5IA\nA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElS\nyUCQJAEGgiSpZCBIkgADQZJUGvcrNAEi4nPA6cAM4E+Ai4H3Ai+Vq9ySmY9ExDJgOTAArMrM1dW3\nLElqh3EDISLOBN6TmYsj4i3APwJ/B6zIzIeHrTcbuAE4FXgdeCoi1mTmK23pXJJUqUYOGT0OXF7e\nfgWYQ7Gn0DVqvdOADZm5KzP7gSeAJVU1Kklqr3H3EDJzEHitnPxtYB3FIaFrIuI64EXgE8AJQO+w\nTXuB+ZV2K0lqm4ZPKkfEJcCVwDXAPcC/z8xzgB8AK8fYZPQehCSpxho9qXwe8CngvMzcCXx72OK1\nwB3Ag8BFw+YvAJ6sqM9f0N09i56euSPmjZ6eiKpq1bGn6VJrqqnr83i416pjT1XXalQjJ5W7gc8B\n52Tmq+W8vwZ+PzOfB84EngU2AHeW6+8HFlNccdQWfX399PbuPDDd0zN3xPREVFWrjj1Nh1pTNVTq\n9jxOh1p17KkdtRrVyB7CB4HjgQciogsYBO4C7o+I3cAu4MrM7I+IFcB6ikBYWe5NSJKmgEZOKq8C\nVo2x6J4x1l0DrKmgL0nSJHOksiQJMBAkSSUDQZIEGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIA\nA0GSVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAlo7Cs0JbXZf/rCal548bWWtn3nwjdz\nxb+7pOKONB01FAgR8TngdGAG8CfAUxRfoXkEsA24IjP3RcQyYDkwAKzKzNVt6Vo6zGx/9XU2713Y\n0rZvebW34m40XY17yCgizgTek5mLgfOBLwA3A3+emUuBjcBVETEbuAE4GzgLuDYijm1X45KkajVy\nDuFx4PLy9ivAHGAp8PVy3lrgA8BpwIbM3JWZ/cATwJJq25Uktcu4h4wycxAYOrj5EWAdcF5m7ivn\nbQfmA/OA4fuuveV8SdIU0PBJ5Yi4BLgKOBd4btiiroNscrD5kqQaavSk8nnApyj2DHZGxM6IODoz\n9wALgC3AVkbuESwAnqy64SHd3bPo6Zk7Yt7o6YmoqlYde5outaaLWbNmTonfhbrWqmNPVddq1LiB\nEBHdwOeAczLz1XL2Y8ClwH3lv48CG4A7y/X3A4sprjhqi76+fnp7dx6Y7umZO2J6IqqqVceepkOt\n6RYq/f17a/+7UNdadeypHbUa1cgewgeB44EHIqILGAR+C/hKRFwNbALuzsyBiFgBrKcIhJWZWc0j\nkiS1XSMnlVcBq8ZYdO4Y664B1lTQlyRpkvnRFZIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUDQZIE\nGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIAA0GSVDIQJEmAgSBJKhkIkiTAQJAklRr5TmUi4mTg\nIeC2zLwjIu4C3gu8VK5
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f943f60>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(column='SibSp', by='Survived', sharey=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that it does not provide too much information. While the survival mean of all passengers is 38%, passengers with 0 SibSp has 34% of probability. Surprisingly, passengers with 1 sibling or spouse have a higher probability, 53%. We are going to see the distribution by gender"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 174\n",
" male 434\n",
"1 female 106\n",
" male 103\n",
"2 female 13\n",
" male 15\n",
"3 female 11\n",
" male 5\n",
"4 female 6\n",
" male 12\n",
"5 female 1\n",
" male 4\n",
"8 female 3\n",
" male 4\n",
"dtype: int64"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that for SibSp, there is almost the same number of men and women. Now we calculate the survival probability."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 0.787356\n",
" male 0.168203\n",
"1 female 0.754717\n",
" male 0.310680\n",
"2 female 0.769231\n",
" male 0.200000\n",
"3 female 0.363636\n",
" male 0.000000\n",
"4 female 0.333333\n",
" male 0.083333\n",
"5 female 0.000000\n",
" male 0.000000\n",
"8 female 0.000000\n",
" male 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f84b710>"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHyVJREFUeJzt3XucVWXd9/HPHGWYAZkZJhQRkoM/D1k+eEtgnjE8lA/c\nacljWUZ28Bjelk+iGaJ4TBBuj2BTmWVl3ZmmKFhaeXpeQabcFr/EFBjARw4DzAwzzGHv+4+9wT3j\nHNbAXnvPnvV9v16+2Ou4fwzj/u51rWtdV148HkdERKIpP9sFiIhI9igEREQiTCEgIhJhCgERkQhT\nCIiIRJhCQEQkwgrDfgMz+wjwGDDP3e/tsO00YC7QCixx95vCrkdERN4X6pWAmQ0EFgLPdrHLAuDf\ngeOBKWZ2WJj1iIhIe2E3BzUBZwIbO24ws0OALe6+wd3jwFPA5JDrERGRFKGGgLvH3H1XF5sPADal\nLL8HHBhmPSIi0l5fujGcl+0CRESiJvQbw93YQPtv/gcl13WptbUtXlhYEGpR0jc1NjZy9tlnE4/H\nycvL44knnqCkpCTbZYnkii6/ZGcyBNoV4e5rzGyQmY0k8eH/aeD87k5QW7szxPKkL6ur28HuwQ7j\n8TgbN25l0KDBWa5KJDdUVQ3qcluoIWBm44E7gVFAi5mdAzwOvO3uvwUuBn4OxIFH3H11mPWIiEh7\noYaAu/8VOKWb7S8Ax4VZg4iIdK0v3RgWEZEMUwiIiESYQkBEJMIUAiIiEaYQEBGJMIWAiEiEZfOJ\nYRGRnBKLxZg//w5qa7dQUFBIXV0dl102k9Gjx2S7tL2mEBARCeitt97kvffe5bbb5gNQU7OOmpp1\nLF26hE2b3qO1tYVp085lzJhxXHPNVcybdzdvvLGSp59+klmzvpfl6junEBARCeiQQ8ZQXLwft9wy\nh6OPHs9HP3o0Q4aUs2HDeubMuYVdu5q4/PJvsGjRj7jwwq9w//13s2bN29x00+3ZLr1LCgERkYAK\nCwu58cZb2bFjO2+88d9UVz+A+yqKi4u5+eYbACgoSAxyeeyxE3nggXs5+eRTGThwYDbL7pZCQEQk\noFdfXcGOHds56aRTmTTpE4wdO47PfvZ/c/rpZ3HNNdcDsGbNOwAsXbqEE044iRUr/sLpp59FVdWH\nslh51xQCIiIBjRtnzJt3G0uW/I7i4v1oampk/vx7eOWVl5g7dzb19fV8/OOTGDhwIE8++QTz59/N\n8cefxO23z+WOOxZku/xO5e0enjcXbNpUlzvFSlrV1e3gq1/94p7lxYsf0lDSIgFVVQ3qcj4BPScg\nIhJhCgERkQhTCIiIRJhCQEQkwhQCIiIRphAQEYkwPScgIv1OW1sbNTVr03rOESNG7nkaOJ1uvvkG\nTjllMpMmHZ/2cwehEBCRfqemZi3X3fUrBpRVpOV8TfVbuWnmuYwadUhazteXKAREpF8aUFbBwMFV\nGX3PJUt+x6uvrmD79m28887bfPWrF/Pss8/wzjvvcP31c/j975exatXfaW7exdSp5/DpT0/dc2ws\nFuP22+eyceMGWltb+cpXvs748f8Wes0KARGRNFq/voZ77lnME088xsMP/5gf/vCnPPnk4zz11BMc\ncshoLr/8Snbt2sV5501rFwLLlj3N0KFVfOc732X79m1cccXF/PjHj4Rer0JARCSNDjvscAAqK4cy\nZsxY8vLyqKiopLm5me3bt3PxxTMoLCxi+/Zt7Y5bufJ1Vq78G6+//jfi8TgtLc20trZSWBjux7RC\nQEQkjVJvHqe+fvfdjWzYsJ577nmQ/Px8pkw5qd1xRUVFfPGLM5g8eUrGagV1ERURyYhVq/7BsGHD\nyM/P54UX/kgs1kZra+ue7UcccSR/+tPzANTWbuWBB+7JSF26EhCRfqmpfmufOtexx05g3bp1XH75\n1znhhJM57rgTuPPOW/dsP/XUT/LXvy7n4otnEIvFmTHja/v8nkFoKGnJCRpKWnojl54TyITuhpLW\nlUAvVVcvYunSp5gy5ayMJbWI9E5BQUG/7NMfBt0T6IWmpkaWLVsCJLpzNTU1ZrkiEZF9oxDohZaW\nFnY3n8XjMVpaWrJcUe9VVy9i+vRpVFcvynYpItIHKAQiRFcyItKRQiBC+sOVjIikl0JARCTC1DtI\nRPqdbHQRbW1t5ZJLLuLDHz6EWbO+l5b3fPfdjVx33f/lwQcfSsv5OqMQEJF+p6ZmLbP/ay4l5aVp\nOV9jbQOzP3Ntt91ON2/eTGtrS9oCYLe8Lnv4p0foIWBm84CJQAyY6e7LU7ZdCnweaAWWu/t/hF2P\niERDSXkppUMHZez97r57HuvX13DzzTewc+dO6uvraGtr48orv83o0WM577xpnH32NJ5//g8cdNAI\nzA7nueee5eCDR3L99TeyevWbzJt3G0VFReTl5XHjjbe1O/9rr73KokX3UlhYxLBhw7j66mvTMrhc\nqPcEzOxEYKy7HwdcBCxM2TYI+BbwCXc/ETjSzCaEWY+ISFguu+xKDj54FAcdNIKJE4/jrrvu5aqr\nvsN//ud8IDFfwGGHHcGDDz7EypWvcdBBB7F48Y957bVXaWiop7Z2K1deeTULFtzHUUd9jKVLl7Q7\n/4IF3+fWW+exYMG9DBlSznPPPZuWusO+EpgMPAbg7qvMbIiZlbl7PdAM7AIGm1kDUAKkb7APEZEs\nWLnyNbZv38YzzzwFQHNz855thx9+BAAVFZWMHXto8nUF9fX1VFRUct99C2lqamLLls1MmXLmnuNq\na7eybt06rr3228TjcZqamhgypDwt9YYdAgcAy1OWNyfXrXb3XWY2B/gXsBP4ubuvDrkeEZFQFRUV\nM3Pm1Rx55Ec+sK2goLDT1/F44pv+BRdcyLHHTuSRRx5u9xxPYWERVVVVLFx4f9rrzfSN4T23OJLN\nQbOAsUAd8JyZHeXuK7s6uLx8IIWF2RvAqbg41m65srKM/ffPXJvjvsrl+nO5dsm8HTvSc0M4VXl5\nKVVVXf/ONTfvoLAwnwkTjmH58hc5+eRJrF69mhdeeIELL7yQ/Pw8hg4to6SkhMLCfCorE+crKMin\nomIgDQ11fOQjxv7778eKFa9w9NFHU1FRSmFhAaNHD6ewsIAdO95jzJgxPPzww0yYMIFDDz10n/9e\nYYfABhLf/HcbDmxMvj4ceMvdawHM7M/AMUCXIVBbuzOkMoOpq6tvt7xlSz3NzbnzqEUu15/LtUvm\n1dY20FjbkLbzNdY2UFvbwKZNdV3us3VrA21tMc44Yxpz536Pz31uOrFYjJkzv82mTXXEYrB5cz0D\nBrTS1hZjy5YGiorqaGuLsXXrTqZOPZevfe3rjBhxMFOnnsv8+XcwadLJtLa2sWlTHd/61iy+9a2r\nKS4uprJyKJMnf6rbelJ1F16hDiVtZpOA2e5+upmNB+5K3gTGzD4EvAAclWwaWgrc4O4vdnW+MIeS\nDtKvuKGhgTlzrtuzfP31N1Fa2vk3jr447GwuD8ecy7VL5mko6fayNpS0u79sZivM7EWgDbjUzL4E\nbHP335rZHcDzZtYCvNRdAIQtSL/iWHNbu+WFL9xPfvEHfymC9CkWkfBoKOngQr8n4O6zOqxambJt\nMbA47BqC6qlfcduuVlKnhh5YWUbBfnreTkRylxpVRUQiTF9jpU/oqQ23oaH9Tb5169Z2eT8Gcrv9\nViSTFALSJ/R0Tybo/RjQPRmR3lAISJ/R3T0Z3Y8RCYf+L+ongnZxTdVdk4qaU0SiQSHQT6iLq4js\nDYVAP6IuriLSW+oiKiISYQoBEZEIUwiIiESYQkBEJMIUAr2Ql58yEF9eh2URkRykEOiF/KICyg6t\nAKBsXAX5RepHLyK5Tf0De6l8wnDKJwzPdhkiImmhKwERkQhTCIiIRJhCQEQkwhQCIiIRphAQEYkw\nhYCISIQpBEREIkwhICISYQoBEZEIUwiIiESYQiBCNACe7Ivq6kVMnz6N6upF2S5F0kghECEaAE/2\nVlNTI8uWLQFg2bKnaWp
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f890358>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Survived', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that when SibSp > 2, the survival probability decreases to the half. We are going to check if there is a difference in the age. "
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Sex \n",
"0 female 28.631944\n",
" male 32.615443\n",
"1 female 30.738889\n",
" male 29.461505\n",
"2 female 16.541667\n",
" male 28.230769\n",
"3 female 16.500000\n",
" male 8.750000\n",
"4 female 8.333333\n",
" male 6.416667\n",
"5 female 16.000000\n",
" male 8.750000\n",
"8 female NaN\n",
" male NaN\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Sex']).Age.mean()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f7dab38>"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAESCAYAAAD67L7dAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH61JREFUeJzt3Xt8VPWd//FXrpALaIgpCggq4kfRuq5drVgteFmsVsUV\nurpa3Upbra12tf5atboWr1VUUKtWoeK17U/txdUKCrRetlb7sNYK2vqpF1QCWhBCSAIh1/3jTCCJ\nSWYS5szJzHk/Hw8ezJlzzswn55G85zvf8z3fk9fe3o6IiMRHftQFiIhIZin4RURiRsEvIhIzCn4R\nkZhR8IuIxIyCX0QkZgrDfHEzKwHuA0YCQ4BrgBnAZ4CPE5vd6O6LwqxDRES2CTX4gROAl939JjMb\nCywBXgAucfeFIb+3iIj0INTgd/dHOi2OBVYmHueF+b4iItK7vExcuWtmLwCjgeOBi9jW9fMP4Dx3\nXx96ESIiAmTo5K67fw44Efgp8ABBV89RwGvAlZmoQUREAmGf3D0QWOPu1e6+zMwKgeXu3nFi93Hg\nzr5eo6Wltb2wsCDMMkVEclGvXephn9z9PDAOuNDMRgLlwN1m9v/cfQUwBXi9rxeoqdkUcokiIrmn\nqmpYr+vCDv67gHvM7HlgKPBNoB542MwaEo/PCrkGybAFC+axePFCpk49jpkzz466HBHpJiMnd7fH\n2rV1g7tA6aKxcTNnnXUa7e3t5OXlc++9P2Xo0JKoyxKJnaqqYb129eTslbsLFszj1FNPYsGCeVGX\nEivNzc10NCba29tobm6OuCIR6S4ng7+xcTNLlgQXAy9Z8hSNjZsjrkhEZPDIyeBXq1NEpHc5GfzS\nlbq9RKQzBX+OU7eXiHSn4M9x6vYSke7CHscvktXa2tqYO/dGamrWUVBQSF1dHeeddwF77DE+6tJE\nBkzBL9KHd955izVrPuKGG+YCUF29kurqlSxevIi1a9fQ0tLMSSfNYPz4CVx66UXMmXM7b7yxnKee\nepLvf/8HEVcv0jMFv0gfdt99PMXFQ/jhD6/igAMOZP/9D2DHHStYvXoVV131Q7ZsaeT887/BvHn3\n8ZWvfJW77rqd999fwTXXzI66dJFeKfgHKU17MDgUFhZy9dXXs3FjLW+88ToLFtyN+5sUFxdz3XXB\nxLIFBcEkggcddAh3330nU6YcSWlpaZRli/Qp64K/tbWV6uoP+tymoaGhy/LKlR9QVlbW47Zjxozd\n+oc7WHQfiXPaaWdo2oOIvPrqK2zcWMvkyUcyadLn2HPPCXzpSydyzDHHcemlVwDw/vvvAbB48SIO\nP3wyr7zyMscccxxVVZ+KsHKR3mVd8FdXf8Dlt/yCoeUjet2mvbWpy/Kch/6XvILiT2zXWL+eay6Y\nwbhxu6e9zu3R00gcBX80Jkww5sy5gUWLfkNx8RAaGzczd+4dvPTSH7j22lnU19fz2c9OorS0lCef\nfIK5c2/nsMMmM3v2tdx4461Rly/So6wLfoCh5SMoHV7V6/q2lkbqOy2XDKskv3Bo+IVJzikvL+eK\nK67+xPP//M+f+cRzt94a3Fpi/Pg9FfoyqGkcv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0Qk\nZrJyOKdId6lc2NdfYVzcd911V3LEEUcxadJhaX1dkf7IzeDP6/zHmtdtWXJRKhf29cdgvbhPJB1y\nMvjzC4ooqdqHzWv/RknV3uQXFEVdkmRAsgv70m3Rot/w6quvUFu7gffeW8HXv34uS5c+zXvvvccV\nV1zFb3+7hDff/CtNTVuYNm06xx8/beu+bW1tzJ59LR9+uJqWlha++tVzOPDAf8lY7RJvORn8AMPH\nTmL42Elpea3BOmFaHOYtGuxWrarmjjvm88QTj/HQQ/dz770/5cknH2fhwifYffc9OP/8C9myZQun\nnHJSl+BfsuQpdtqpiksu+W9qazfw7W+fy/33/zzCn0TiJNTgN7MS4D5gJDAEuAZ4DXiQ4MTyh8AZ\n7j5obws1mCdMq67+gFm/upaSip6DHKCtqbXL8m2/v4v84k+G++aaBmadfJm6Nvpp7733AaCycifG\nj9+TvLw8RoyopKmpidraWs49dyaFhUXU1m7ost/y5ctYvvwvLFv2F9rb22lubqKlpYXCwpxti8kg\nEvZv2QnAy+5+k5mNBZYALwC3u/svzexaYCZwd8h1DNhgnzCtpKKMsp2G9bq+dUsLnSOntLKcgiEK\nl3Tp/A2p8+OPPvqQ1atXcccdPyE/P5+pUyd32a+oqIgzz5zJUUdNzVitIh1CHc7p7o+4+02JxbHA\nSmAy8HjiuSeAo8OsQSQKb775N0aOHEl+fj6///1ztLW10tLSsnX9xIn78vzzzwJQU7Oeu+++I6JK\nJY4y0vQzsxeA0QTfAJZ06tpZA+ySiRok9zXWrx80r3XQQQezcuVKzj//HA4/fAqHHno4N998/db1\nRx75r/z5z3/i3HNn0tbWPqjOHUnuy0jwu/vnzGx/4KdAXqdVeb3sslVFRSmFhdu+Qm/c2Ht/9kBU\nVJRRVdV7V0lxcVuX5crKcnbYofft0yHV98z0sUhFFMcLYMSIfbnjyvQej912263Pk91nnvkfWx9P\nm3Ys06Yd+4nHfbnpphu2v0iRAQj75O6BwBp3r3b3ZWZWANSZ2RB330LwLWB1X69RU7Op23JDL1sO\nTE1NA2vX1vW6vq6uvsvyunX1NDWFe8Fzqu+Z6WORiiiOV4fhw9N7x6v16zcl30hkkOqrERf2X+Tn\ngYsAzGwkUA4sBWYk1k8Hngq5BhER6STsrp67gHvM7HlgKHAu8ArwoJmdDbwP3B9yDYNKOsfer1pV\nndbaRCQeQg1+d28ETu9hVWzHsKXznsG1a1awU3quURORGNGA7gik657BwciT7euTF5H40bTMIiIx\noxa/5IQopmVuaWnhm9/8Grvttjvf//4P0vKeH330IZdffjE/+ckDaXk9kZ7EOvjb29uSniDVJGfZ\nIZV5i/ojlbmLPv74Y1pamtMW+h3ykl7dIrJ9Yh38Wxo2MP+P92uSsxyRbN6idLv99jmsWlXNdddd\nyaZNm6ivr6O1tZULL/wue+yxJ6ecchInnHASzz77O0aPHoPZPjzzzFJ23XUsV1xxNW+//RZz5txA\nUVEReXl5XH111wu6XnvtVebNu5PCwiJGjhzJ9753mSZxk7SIfR9/R1j09q+0srzL9qWV5T1ul66W\npmSP8867kF13Hcfo0WM45JBDueWWO7nookv40Y/mAsGc+3vvPZGf/OQBli9/jdGjRzN//v289tqr\nNDTUU1Ozngsv/B633vpjPv3pf2Lx4kVdXv/WW2/i+uvncOutd7LjjhU888zSKH5MyUFqPki/JOtL\n70/XGORG99jy5a9RW7uBp59eCEBT07bhuPvsMxGAESMq2XPPvRKPR1BfX8+IEZX8+Me30djYyLp1\nHzN16rZpHmpq1rNy5Uouu+y7tLe309jYyI47VmTwp5JcpuCXfkl2HUKq1yBA7tzesKiomAsu+B77\n7rvfJ9YVFBT2+Li9PWjRn3HGVzjooEP4+c8forFx89b1hYVFVFVVcdttd4VbvMRS7Lt6pP86rkPo\n6V/JsMou25YMq+x123TdHzdqEyfux/PPPwPAihXv8sgjP+tz++D2Du3U1tYyatQYmpqaeOmlF2hu\n3nY/omHDhpGXl8d7760A4Je/fJh33307rB9BYkYt/hyXl99piEhet+UcszmNk9al+lp5eTBjxilc\ne+0P+Na3vk5bWxsXXPDdjrVdtuv6OI/p0/+dSy75DmPG7MqMGacwd+6NXW7McvHFl3PddVdSXFxM\nZeVOTJs2fft/MBEU/Dkvv6iA8r1GUP/39ZRPGEF+UXb3p/dmzJixzDr5srS/Zl923nkX5s8Pxttf\nc83sT6x/9NH/2fq4Y7vOj0888d848cR/2/r84YdP6bJ+//0PYN68+wZUu0hfFPwxUHHwKCoOHhV1\nGaEqKCjI+nMFIpmiPn4
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f890a90>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Age', hue='Sex', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Effectively, when SibSp > 3, age is lower. We are going to check the relationship with Pclass."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Pclass\n",
"0 1 137\n",
" 2 120\n",
" 3 351\n",
"1 1 71\n",
" 2 55\n",
" 3 83\n",
"2 1 5\n",
" 2 8\n",
" 3 15\n",
"3 1 3\n",
" 2 1\n",
" 3 12\n",
"4 3 18\n",
"5 3 5\n",
"8 3 7\n",
"dtype: int64"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Pclass']).size()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"SibSp Pclass\n",
"0 1 0.562044\n",
" 2 0.416667\n",
" 3 0.236467\n",
"1 1 0.746479\n",
" 2 0.581818\n",
" 3 0.325301\n",
"2 1 0.800000\n",
" 2 0.500000\n",
" 3 0.333333\n",
"3 1 0.666667\n",
" 2 1.000000\n",
" 3 0.083333\n",
"4 3 0.166667\n",
"5 3 0.000000\n",
"8 3 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['SibSp', 'Pclass']).Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f8a3a58>"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHIdJREFUeJzt3Xt0VeWd//F3LiSEJECAIFoG1Eq/eKljvbSISpfgtauj\nrdWfVq2Mjk6X4gVLf9MRb0VFnbYgolVHZrBV66V1LOpvxELVKl7GKTpeWsevF64BLyEcCIlJCMn5\n/bEPcBIJOZC9z8nJ/rzWcnH2JXt/A8fzOfvZ+3megmQyiYiIxFNhrgsQEZHcUQiIiMSYQkBEJMYU\nAiIiMaYQEBGJMYWAiEiMFUd9AjM7CFgAzHb3uzptOxa4GdgCuLtfGHU9IiKyXaRXAmY2AJgL/LGL\nXe4BTnP3Y4CBZnZSlPWIiEhHUTcHNQMnAx93sf0wd9+6rRYYGnE9IiKSJtIQcPd2d2/ZyfYGADPb\nEzgeeDrKekREpKOc3xg2s+HAk8DF7p7IdT0iInES+Y3hnTGzSoJv/1e5+7Pd7b9lS1uyuLgo+sJE\nRPqWgq42ZDMEdlTEbIKnhhZncoBE4vNwKxIRiYHq6soutxVEOYqomR0KzAJGA63AGoKmn+XAImA9\n8CpBQCSBh9z937o6Xm3tJg15KiKyi6qrK7u8Eog0BMKmEBAR2XU7C4Gc3xgWEZHcUQiIiMSYQkBE\npIfmz7+Xs876DvPn35vrUnaZQkBEpAeam5tYvHghAIsXP0Nzc1OOK9o1CgERkR5obW1l6wM2yWQ7\nra2tOa5o1ygEUubPv5dFi57JdRkiIlmV0x7D2fA///M6N988gzFjjGSynZaWFi6/fBp7771PrksT\nEcm5Ph8CAIcf/g1+8pOrAXjrrTe57755VFUNoa5uHY2NDVx22Y+27VtfX88tt8ygsnIg69bVcu21\nN9DW1s6sWbcwePAQNmxYz7RpV7Fy5XIef/y3lJdXADB9+vU5+d1ERHoiFiEQdEYObNiwnvr6jZSW\nlnLjjbeycuUK6urWbdu+ceMGzjzzHA455FAefvhBlix5gSFDhrLHHntyxRXTqKuro7i4mFdeeYlJ\nk05k4sTjWLVqBVu2bKG4OCZ/nSLSZ8TiU2vp0v9m+vT/SzKZZNCgQZx44rdYu3YNAKNH783o0Xvz\n5ptvAFBaWsof/rCQF198nuXLlzF+/DEcddQxrF69iiuvnMLQoUO5/PJpnHfe+dx//338/ve/46CD\nDuaHP5ySy19RRGS3xCIE0puDAN599y+89tqrACxb9hGrV69MbUny6KO/4YgjvsHEiccxb97dtLe3\nUVOzmuOPP5Hvf/9cnnpqAX/4w9OMHXsAF198GcXFxdxww7V88IEzZozl4LcTEdl9sQiBzg444CCq\nqqq47rqrqK/fyBVX/Jhlyz4CCvja1w7n4Ycf4K233mCPPUawaNEzjB17APfffx/Dhg2joWETl1xy\nBe+++xceeuh+Bg0aTEFBAXvvvW+ufy0RkV2mAeRERHpg06Z6LrrovG3L8+bdT2XlwBxW9EUaQE5E\nRHZIISAiEmMKARGRGFMIiIjEmEJARCTGFAIiIjHWp/oJtLW1UVOzKtRjjhw5iqKiom73W7bsQ666\n6seceeY5nHbaGaHWICISlT4VAjU1q7hmzmP0rxgSyvGaG9Zz09TTGT165yOONjc3M2fOLzj88K+H\ncl4RkWzpUyEA0L9iCAMGVmf1nCUlJfziF3N58MFfZfW8IiI9pXsCISgsLKSkpCTXZYiI7DKFgIhI\njCkERERiTCEQsnwakE9EpM/dGG5uWJ/1Y7m/x5133sYnn3xCcXERL7zwHDNn/pzKysrQahERiUKf\nCoGRI0dx09TTQz9md8zGcscd/xrqeUVEsiHyEDCzg4AFwGx3v6vTtuOAmcAWYKG739STcxUVFXX7\nTL+IiGwX6T0BMxsAzAX+2MUutwPfBY4GTjCzsVHWIyIiHUV9Y7gZOBn4uPMGM9sHqHP3te6eBJ4G\nJkVcj4iIpIk0BNy93d1butg8AqhNW/4M2DPKekREpKPe9Ihol3NgiohINHL5dNBaOn7z/1JqXZeq\nqgZQXNz9iJ4iItlSUtLeYXno0AoGDcqfx8OzGQIdvum7+0ozqzSzUQQf/t8Gzt7ZARKJz3d6glwO\nJX3XXbfz9ttv0dbWxrnn/j3f/OaxodYhIr3Tpk0NHZbr6hrYvLk3NbJAdXXXoRRpCJjZocAsYDTQ\nambfA54Elrv7E8DFwCNAEnjY3T/syflqalbx08dnUlZV3sPKA02JRn562tXdPnb6xhtLWbFiOffc\nM5/6+o2cf/45CgERyQuRhoC7vwF0+Wno7i8B48M8Z1lVOeXDsnsp9rWvHcaBBx4EQEVFJS0tzSST\nSQoKdJtDRHq33nXNkqcKCgooLe0PwFNPLWDcuKMUACKSF/rUsBG5tmTJn3j66ae47bY7c12KiEhG\nFAIhee21V3nggV8xe/adDBgQzj0JEZGoKQRC0NjYwF13zeX22++moqIi1+WIiGSsz4VAU6Ix68d6\n9tnF1Ndv5Lrr/nnbDeFrrpnB8OF7hFaLiEgUCvJpEpTa2k07LTaX/QREJJ42barnoovO27Y8b979\nVFYOzGFFX1RdXdnlkyp96kpAQ0mLiOwaPSIqIhJjCgERkRhTCIiIxJhCQEQkxhQCIiIx1qeeDsrV\nI6ItLc3MnDmD9evraG1tZfLkf2D8+KNDrUNEJAp9KgRqalbx3PVXM6ysLJTjrWtqYuKMmd0+dvrS\nS0sYO/YAzj77B3zyySdceeUlCgERyQt9KgQAhpWVMSLLY/dMmnT8tteffvoJw4ePyOr5RSQzUbQW\nNDZ2HFlg9epVlJeH9xkUdYfVPhcCuXTxxRdQW1vLz352W65LEZEdCLu1AKClveP0ku/ePovSwnBu\nt2baGtETCoEQ3X33fD744H1mzLiWX//64VyXIyI7EHZrQVNbG2xMbFseXjaAsjwaakZPB4XA/T0+\n++xTAMaM+QptbW1s2LAhx1WJiHRPIRCCt956g0ceeRCA9evraG5uYvDgwTmuSkSke32uOWhdU1PW\nj3Xqqd/j1ltvZMqUi9i8uYVp034SWg0iIlHqUyEwcuQoJs6YGfoxu1NaWsr1198U6nlFRLKhT4WA\nhpIWEdk1uicgIhJjCgERkRhTCIiIxJhCQEQkxhQCIiIxphAQEYkxhYCISIxF3k/AzGYD44B2YKq7\nL03bNgU4B9gCLHX3H0Vdj4iIbBfplYCZTQD2c/fxwIXA3LRtlcCPgaPcfQJwoJl9Pcp6RESko6ib\ngyYBCwDc/T1gsJlVpLZtBlqAgWZWDJQB6yOuR0RE0kQdAiOA2rTldal1uHsLcAOwDFgOvObuH0Zc\nj4iIpMn22EEFW1+kmoOmA/sBm4Dnzeyr7v5OVz9cVTWA4uL8maxBRHqX+vrsTj0bhqqqcqqrKyM7\nftQhsJbUN/+UvYCPU6/3Bz5y9wSAmS0BDgO6DIFE4vOIyhSROEgkGrvfqZdJJBqprd3Uo2PsLESi\nbg5aBJwOYGaHAmvcfeu/wgpgfzMrTS0fDnwQcT0iIpIm0isBd3/VzF43s5eBNmCKmU0GNrj7E2b2\nc+BPZtYKvOLuL0dZj4iIdBT5PQF3n95p1Ttp2+YB86KuQUREdkw9hkVEYkwhICISYwoBEZEYy+ie\ngJmdApwE7J1atQJ4xt2fjKYsERHJhp2GgJkdBDxI0Kv3j8B/pjaNBv7ezGYA57r7XyOtUkREItHd\nlcDtwFmpcX86u8vMxgJ3AseFXpmIiESuuxA4yd1bAcysChgDJAF393p3f8/MTo66SBERicZObwyn\nBcCVwIfAHOAO4CMzuzh9HxERyT+ZdhabDOzr7hth21XB88DdURUmIiLRy/QR0U+2BgBAatC35dGU\nJCIi2ZLplcAyM1tAMCBcIXAsUGdmFwC4+/yI6hMRkQhlGgJlQAI4IrVcDxQBxxDcKFYIiIjkoYxC\nwN3P3/razAYDG909GVl
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f8cd240>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"Sex\", y='SibSp', hue='Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that in 3rd class, females had higher SibSp."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f6b6e80>"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAESCAYAAAAbq2nJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHsRJREFUeJzt3XmcVOWV//FPLyxNN0uzKCIDGsSDSxJFxyguiaBGs5kY\nMxo1cYkziWHiEjKTETWKEfVnAlEniQn80ho1LjGJC7+4QBaNC/E3aFwyjicKCjRohKZZuumGprvm\nj6qG6rK76nZTt25X3e/79eJl3bVOtd117n2e556nLJFIICIi8VQedQAiIhIdJQERkRhTEhARiTEl\nARGRGFMSEBGJMSUBEZEYqwz7DczsYOAhYL67/zhj2wnAXGAH8Ji7Xxd2PCIiskuodwJmNgS4Ffhd\nD7vcAnwOOAY4ycymhBmPiIh0FXZzUCtwCvBO5gYz2xdocPe17p4AHgVmhByPiIikCTUJuHuHu2/r\nYfNYYF3a8nvAXmHGIyIiXfWnjuGyqAMQEYmb0DuGs1hL1yv/vVPrerRjR3uisrIi1KBk97S0tPDp\nT3+aRCJBWVkZixYtoqqqKuqwAssW//Lly7nkZ7Opqq0Ofr7GZm75yvVMmjQprJBFgujxIruQSaBL\nEO6+0syGmtkEkl/+nwLOynaCxsatIYYn+bBly2Y6ixImEgneeWcDQ4cOiziq4LLF39jYTFVtNdWj\nh/bqnI2NzaxbtyXvsYoENWZMz7+zoSYBM5sKzAMmAm1m9nngEeAtd38YuAi4D0gA97r7m2HGIyIi\nXYWaBNz9ReD4LNufAaaFGYOIiPSsP3UMi4hIgSkJiIjEmJKAiEiMKQmIiMSYkoCISIwpCaTU1S1g\n8eLHow5DRKSgonxiuCD+8pcXuP76OUyebCQSHWzbto2LL57FPvvsG3VoIiKRK/kkAHD44R/h29++\nAoCXX36J229fSG3tSBoa1tPc3MQ3vvHNnftu3ryZG26Yw9Chw1i/fh1XXXUt7e0dzJt3AyNGjGTj\nxg3MmnU5K1e+xW9+80uqq2sAmD376kg+m4jI7ohFEkg+kJy0ceMGNm/exKBBg/jud29k5cq3aWhY\nv3P7pk0bOeOMsznkkKnce+/dPP30U4wcOYo999yLSy6ZRUNDA5WVlTz33DPMmPFxpk8/gVWr3mbH\njh1UVsbkxykiJSMW31rLlv1/Zs/+NxKJBMOHD+fjH/8Ea9euAWDixH2YOHEfXnrpRQAGDRrEE088\nxp/+9EfeemsF06Ydy9FHH8vq1au47LKZjBo1iosvnsWXv3w+d955Ow8++AAHH/whvvrVmVF+RBGR\nPolFEkhvDgJ47bW/8vzzSwFYsWI5q1evTG1JcP/9v+Af//EjTJ9+AgsX3kZHRzv19as58cSP88Uv\nnsOiRQ/xxBOPMmXKgVx00TeorKzk2muv4o03nMmTLYJPJyLSd7FIApkOPPBgamtr+c53Lmfz5k1c\ncsm3WLFiOVDGoYcezr333sXLL7/InnuOZfHix5ky5UDuvPN2Ro8eTVPTFr7+9Ut47bW/cs89dzJ8\n+AjKysrYZ58PRP2xRER6reSTwKGHHsahhx72vvUXXzyry/K+++76Ej/mmON2vj7rrC8DMG/eoV32\n33vv8Zx44sn5DFVEpOD0nICISIwpCYiIxJiSgIhIjCkJiIjEmJKAiEiMKQmIiMRYSQ0RbW9vp75+\nVV7POX78BCoqKnLut2LFm1x++bc444yzOe20L+Q1BhGRsJRUEqivX8WVN/+KwTUj83K+1qYNXHfp\n6UycmL3iaGtrKzff/H0OP/yIvLyviEihlFQSABhcM5Ihw8YU9D0HDhzI979/K3fffUdB31dEZHep\nTyAPysvLGThwYNRhiIj0mpKA9At1dQs488zPUle3IOpQRGJFSUAi19rawpIljwGwZMnjtLa2RByR\nSHwoCeRZIpHIvZN00dbWtvPnlkh00NbWFnFEIvFRch3DrU0bCn4u99f54Q9/wLvvvktlZQVPPfUH\n5s79HkOHDs1bLCIiYSipJDB+/ASuu/T0vJ8zF7Mp/Od//jSv7ysiUggllQQqKipyjukXEZFd1CfQ\nD2mkjIgUipJAP6ORMiJSSEoC/YxGyohIISkJiIjEmJKAiEiMldTooChLSf/4x7fwyisv097ezjnn\nnMdHP3p8XuMQEQlDSSWB+vpVXPObuVTVVuflfC2NzVxz2hU5h52++OIy3n77LX7ykzo2b97E+eef\nrSQgIkUh9CRgZvOBI4EO4FJ3X5a2bSZwNrADWObu39zd96uqraZ6dGGf1D300MM46KCDAaipGcq2\nba0kEgnKysoKGoeISG+F2idgZscB+7n7NOBC4Na0bUOBbwFHu/txwEFmVpSzspSVlTFo0GAAFi16\niCOPPFoJQESKQtgdwzOAhwDc/XVghJnVpLZtB7YBw8ysEqgC8lf4JwJPP/0kjz66iG9+89+jDkVE\nJJCwm4PGAsvSlten1r3p7tvM7FpgBbAVuM/d3ww5ntA8//xS7rrrDubP/yFDhuSnT6K/664jvrm5\nucvy6tWrqK7e9fMI2tEuIoVR6I7hnW0kqeag2cB+wBbgj2b2QXd/taeDa2uHUFnZ8xfI5s35//Kt\nra1mzJjsfQxNTU0sWPBD7rjjDkaO3L35jQcO7OiyPGpUDcOH989qpMuXL3/fnM6J9u1d9pl/99OU\nVSRnXWtt2sCP5pzLpEmTuuzTnz5ztlj6+vsV5HdIJCphJ4G1JK/8O40D3km9PgBY7u6NAGb2NHAY\n0GMSaGzcmvXNGhubaWlszrpPb7Q0NtPY2My6dVuy7vfIIw+yYUMjM2d+Y2eH8JVXzmGPPfbs9Xtu\n2dLUZbmhoYnt2/vn4xyNjc3vm9O5Y0cr6Z+gaugoyisHdzkm8+fZnz5ztlga+/i7FeR3SCRM2S5C\nwk4Ci4FrgIVmNhVY4+6df0lvAweY2SB33wYcDvx2d95s/PgJXHPaFbtzim7PmctnPvM5PvOZz+X1\nfUVECiHUJODuS83sBTN7FmgHZprZucBGd3/YzL4HPGlmbcBz7v7s7ryfSkmLiPRO6H0C7j47Y9Wr\nadsWAgvDjkFERLrXPxubRUSkIJQERERiTElARCTGlARERGKspKqIRlVKetu2VubOncOGDQ20tbVx\n7rlfYdq0Y/Iah4hIGEoqCdTXr+IPV1/B6KqqvJxvfUsL0+fMzTns9JlnnmbKlAM566wv8e6773LZ\nZV9XEhCRolBSSQBgdFUVYwtcu2fGjBN3vv77399ljz3GZtl7l77U3gHV3xGR/Cm5JBCliy66gHXr\n1nHTTT8ItH99/ape1d6BZP2d6y49XQ/FiUheKAnk0W231fHGG39jzpyr+PnP7w10TG9r74iI5JNG\nB+WB++u8997fAZg8eX/a29vZuHFjxFGJiOSmJJAHL7/8IvfddzcAGzY00NrawogRIyKOSkQkt5Jr\nDlrf0lLwc5166ue58cbvMnPmP7N9+zZmzfp23mIQEQlTSSWB8eMnMH3O3LyfM5dBgwZx9dXX5fV9\nRUQKoaSSgEpJi4j0jvoESkRd3QLOPPOz1NUtiDoUESkiSgIloLW1hSVLHgNgyZLHaW3NX7+IiJQ2\nJYES0NbWRiKRACCR6KCtrS3iiESkWCgJiIjEmJKAiEiMKQmIiMSYkoCISIwpCYiIxJiSgIhIjCkJ\niIjEWEmVjRAJqqf5qLPN7LZmTX1BYhMpJCUBiaXuZnWD7DO7bXrvLUYfVbAQRQoicBIwsz2BianF\nle7+93BCEimMzFndIPvMbq1NG4AthQtQpAByJgEz+yfgcmAvYHVq9QQzWwPc4O4PhBifiIiEKGsS\nMLM7Uvuc5+4vZ2z7MPBvZvZJdz8vtAj7oK5uAYsXP8pJJ32CCy74l6jDERHpt3KNDnrQ3c/JTAAA\n7v6yu58DPBhOaH2jipoiIsHlag46JHXF3y13v9bdH85zTLulu4qagwdXRRyViEj/lCsJdG6fnPr3\nJ6AC+CjwlxDjEhGRAsi
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f61f588>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=\"SibSp\", y='Survived', hue='Pclass', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems that SibSp is relevant for determining the survival rate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature ParCh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The feature Parch (Parents-Children Aboard) is somewhat related to the previous one, since it reflects family ties. It is well known that in emergencies, family groups often all die or evacuate together, so it is expected that it will also have an impact on our model."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Parch\n",
"0 678\n",
"1 118\n",
"2 80\n",
"3 5\n",
"4 4\n",
"5 5\n",
"6 1\n",
"dtype: int64"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Parch').size()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f575320>"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAESCAYAAAD9gqKNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFUNJREFUeJzt3X+QXWd93/H3sotFtJbQSlkLWY4FSWa+jMMkKfGMGZki\nS47l0OCYIEMIiuJG0NjFtKoT6JiZWDYu7VASkwYzKR0Fg/GYtDhRCIqLkV2wsbBApCQQ2uYLsYOV\nlZxosVde/cDyerX94zyS765Xq7vSnnt2pfdrRqN7zj336rM7q/3c5zznR9fY2BiSJL2k6QCSpNnB\nQpAkARaCJKmwECRJgIUgSSosBEkSAD11vnlEbAQ2AGNAF/BzwOuB/wocBb6dmTeUbd8HXFPW35aZ\nX6gzmyRpvK5OnYcQEW8A3gr8FPDezPxmRNwDfBpI4F7gdUAf8AhwUWZ6koQkdUgndxltBv4z8MrM\n/GZZtw24AlgNfCEzRzPzB8D3gYs6mE2SznodKYSIuBjYDYwCQy1P7QOWAUuBwZb1g2W9JKlDOjVC\neBfwqfK4q2V914s3nXK9JKkmtU4qt7gMeE95vKRl/XJgD7AXePWE9XunesPnnx8d6+npnsGIknRW\nOOEH7toLISKWAQcy8/my/P8iYmVmPgq8Bfgo8D3gtyJiM3AecH5m/t+p3ndo6HDNySXpzNPfv+CE\nz3VihLCMaq7gmBuB/xYRXcDXM/NLABGxherooqPA9R3IJUlq0bHDTmfa4OCBuRlckhrU37/ghLuM\nPFNZkgRYCJKkolNHGdVudHSUgYHdTceY1AUXXEh3t0dESZrdzphCGBjYzZ7/fi/LFy1uOso4e/Y/\nDW9/KytWvKrpKJI0pTOmEACWL1rMiiX9TceQpDnJOQRJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmw\nECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQBFoIkqbAQJEmAhSBJKiwESRJgIUiSitpvoRkR\n64H3ASPAZuBvgLupyuhJYENmjpTtNgGjwJbMvLPubJKkF9Q6QoiIxVQlsBJ4E/Bm4DbgjsxcBTwG\nbIyI+cDNwBpgNXBjRCyqM5skaby6Rwg/DzyQmYeBw8B1EfE4cF15fhvwXuC7wK7MPAgQETuAS4H7\nas4nSSrqLoRXAr0R8efAIuADwPzMHCnP7wOWAUuBwZbXDZb1kqQOqbsQuoDFwC9TlcOXy7rW50/0\nuin19c2np6f7+PLwcC9DpxyzXn19vfT3L2g6hiRNqe5C+Cfg0cw8CjweEQeAkYiYl5lHgOXAHmAv\n40cEy4GdU73x0NDhCcuHZjL3jBoaOsTg4IGmY0jSlB9O6z7sdDuwJiK6ImIJcC7wIHBNeX4dcD+w\nC7g4IhZGxLlUk9CP1JxNktSi1kLIzL3AnwBfo5ogvgG4Bbg2Ih4G+oC7MvNZ4CaqAtkO3JqZfqSW\npA6q/TyEzNwCbJmweu0k220FttadR5I0Oc9UliQBFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRY\nCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSos\nBEkSYCFIkgoLQZIEQE+dbx4Rq4B7ge8AXcC3gd8F7qYqoyeBDZk5EhHrgU3AKLAlM++sM5skabxO\njBAeysw1mbk6MzcBtwF3ZOYq4DFgY0TMB24G1gCrgRsjYlEHskmSik4UQteE5cuAbeXxNuAK4BJg\nV2YezMxngR3ApR3IJkkqat1lVFwUEZ8DFlONDuZn5kh5bh+wDFgKDLa8ZrCslyR1SN2F8D3g1sy8\nNyJ+HPjyhH9z4ujhZOuP6+ubT09P9/Hl4eFehk4naY36+nrp71/QdAxJmlKthZCZe6kmlcnMxyPi\nH4GLI2JeZh4BlgN7gL2MHxEsB3ZO9d5DQ4cnLB+aweQza2joEIODB5qOIUlTfjitdQ4hIt4REb9d\nHr+CatfQJ4FryibrgPuBXVRFsTAizgVWAo/UmU2SNF7du4w+D3wmIq4GXgpcB3wL+HRE/CbwBHBX\nZo5GxE3AduAo1W4mP1JLUgfVvcvoIPBLkzy1dpJttwJb68wjSToxz1SWJAEWgiSpsBAkSYCFIEkq\nLARJEmAhSJIKC0GSBFgIkqTCQpAkARaCJKmwECRJgIUgSSosBEkSYCFIkgoLQZIEWAiSpMJCkCQB\nFoIkqbAQJEmAhSBJKiwESRJgIUiSCgtBkgRAT93/QES8DPgOcBvwJeBuqiJ6EtiQmSMRsR7YBIwC\nWzLzzrpzSZLG68QI4WbgqfL4NuCOzFwFPAZsjIj5ZZs1wGrgxohY1IFckqQWtRZCRATwauA+oAtY\nBWwrT28DrgAuAXZl5sHMfBbYAVxaZy5J0ovVPUK4HfgtqjIA6M3MkfJ4H7AMWAoMtrxmsKyXJHVQ\nbXMIEbEBeDQzn6gGCi/SNdnKKdaP09c3n56e7uPLw8O9DE07ZWf09fXS37+g6RiSNKU6J5V/EXhV\nRFwFLAeeAw5GxLzMPFLW7QH2Mn5EsBzYebI3Hxo6PGH50AzFnnlDQ4cYHDzQdAxJmvLDaW2FkJlv\nP/Y4IjYD3wdWAtcA9wDrgPuBXcAfRcRC4GjZZlNduSRJk+vUeQjHdgPdAlwbEQ8DfcBdZSL5JmB7\n+XNrZvpxWpI6rPbzEAAy8wMti2sneX4rsLUTWSRJk2trhBARn5pk3RdnPI0kqTFTjhDKGcTXA6+J\niK+0PHUO1eGikqQzxJSFkJn3RMRDVJPAt7Q8dRT4PzXmkiR12EnnEDJzD3BZRLwcWMwLE8SLgKdr\nzCZJ6qC2JpUj4g+AjVRnER8rhDHgx2vKJUnqsHaPMloD9JdDRCVJZ6B2z0P4nmUgSWe2dkcIA+Uo\nox3A88dWZubmWlJJkjqu3UJ4CvhfdQaRJDWr3UL4D7WmkCQ1rt1CeJ7qqKJjxoBngCUznkiS1Ii2\nCiEzj08+R8Q5wOXAz9QVSpLUedO+2mlmPpeZX6C6/aUk6QzR7olpGyes+jGqG9lIks4Q7c4h/POW\nx2PAMPC2mY8jSWpKu3MIvwEQEYuBscycrbcvliSdonZ3Ga0E7gYWAF0R8RTwa5n5l3WGkyR1TruT\nyh8Crs7M8zKzH/hV4CP1xZIkdVq7hTCamd85tpCZf0XLJSwkSXNfu5PKRyNiHfBAWf4FYLSeSJKk\nJrRbCNcDdwB/RHW3tL8G/lVdoSRJndfuLqO1wJHM7MvMJVQ3yfkX9cWSJHVau4Xwa8BbWpbXAu+Y\n+TiSpKa0u8uoOzNb5wzGeOFWmicUET8CfApYCswDPgh8i+oQ1pcATwIbMnMkItYDm6jmJrZk5p3t\nfhGSpNPXbiF8PiIeBR6h+kV+OfCnbbzuKuAbmfl7EXEh1aT0V4GPZeafRsR/BDZGxN3AzcDFVEcv\nfSMitmbm/ml+PZKkU9TumcofjIiHgEuoRgfvzsyvtfG6z7YsXgj8A7AKuK6s2wa8F/gusCszDwJE\nxA7gUuC+9r4MSdLpaneEQGbuoLqF5rRFxFepLoZ3FfBAZo6Up/YBy6h2KQ22vGSwrJckdUjbhXA6\nMvPSiPhp4B7Gzz2caB7ipPMTfX3z6enpPr48PNzLbL3AUl9fL/39C5qOIUlTqrUQIuK1wL7MHMjM\nb0dEN3AgIuZl5hGqUcMeYC/jRwTLgZ1TvffQ0OEJy4dmNPtMGho6xODggaZjSNKUH06nfYOcaXoD\n8NsAEbEUOBd4ELimPL8OuB/YBVwcEQsj4lxgJdUEtiSpQ+ouhI8D50XEV6gmkP81cAtwbUQ8DPQB\nd2Xms8BNwPby59bM9CO1JHVQrbuMyi/69ZM8tXaSbbcCW+vMI0k6sbpHCJKkOcJCkCQBFoIkqbAQ\nJEmAhSBJKiwESRJgIUiSCgtBkgRYCJKkwkKQJAEWgiSpsBAkSYCFIEkqLARJEmAhSJIKC0GSBFgI\nkqTCQpAkARaCJKmwECR
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f57c550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Distribution\n",
"sns.countplot('Parch', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most of the passenger had any parent or children.\n",
"\n",
"We analyze now the relationship with Survived."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Parch\n",
"0 0.343658\n",
"1 0.550847\n",
"2 0.500000\n",
"3 0.600000\n",
"4 0.000000\n",
"5 0.200000\n",
"6 0.000000\n",
"Name: Survived, dtype: float64"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Parch').Survived.mean()"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f53e6d8>"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAESCAYAAAACDEUqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X901PWd7/FnQkIIJJAAEwLE8CvyTgAVQSgFxR+0rlQR\n67qut71W27La1u2pPffcvbeebre9e09v7/ZKrXtOt9btat1qq9ZW8VfFir9AVH4oAiYfQH6HQBIg\n/P6duX/MYKeRSSbJzHx/zOtxjsdkvl+Y9/eEvOYzr/nOd/Ki0SgiIhIO+V4PICIi6aNQFxEJEYW6\niEiIKNRFREJEoS4iEiIKdRGREClIZSczWwjMANqBu51zKxO2VQG/AQqB1c65b2RiUBER6VqXK3Uz\nmw3UOOdmAguA+zvsci/wY+fcDOBMPORFRMQDqdQvc4CnAZxzDUCZmZUAmFkecCnwbHz7N51zOzM0\nq4iIdCGVUK8EWhK+b43fBhABDgP3mdmbZvbDNM8nIiLd0JMXSvM6fD0S+AlwOXCxmc1Nx2AiItJ9\nqbxQuos/r8wBRgBN8a9bga3Oua0AZvYKMBF4MdlfFo1Go3l5eck2i2RNNBpl8Tvb+eWidRw7cRqA\nyeMj/POdMz2eTOScUgrOVEJ9MfB94EEzmwI0OueOADjnzpjZZjMb55z7CJgKPNbpVHl5tLQcSmW2\nQIpESnV8AbD3wHEe/mMD67fso7ioD1+eW8tLK3ZQv3UfTbsPUNAnfGf7huVnl0wuHF8qugx159xy\nM1tlZsuAM8BdZnYb0Oacewb4NvBw/EXTtc65Z3sxt0hGRaNR3vygid++spHjJ88waexgbr+mlsED\n+7FtzyGWrG5ka9MhaqoGeT2qSI+kdJ66c+6eDjetTdj2EXBZOocSyYRzrc4vvXA4Z+vA2upylqxu\npGH7foW6BFZKoS4SZJ2tzhONP68MALd9P9fNHO3BpCK9p1CXUOtqdZ5o4IC+VFeWsrHxAKfPtIey\nV5fwU6hLKKW6Ou/ognFD2b77kHp1CSyFuoROd1bnHV0wbijPL9uiXl0CS6EuodHT1XmiiWOHAOrV\nJbgU6hIKvVmdJyorLWLk0AHq1SWwFOoSaOlYnXdk1WU0rj6iXl0CSaEugZWu1XlHOl9dgkyhLoGT\nidV5Ip2vLkGmUJdAydTqPNHAAX3Vq0tgKdQlEDK9Ou9IvboElUJdfC8bq/OO1KtLUCnUxbeyvTpP\npF5dgkqhnianz7Tz0yfXsO/QSSbXDGF63TCqh5VkdDUZZl6szhOpV5egUqinydNvbmH91v3k5cGL\n7xzhxXe2U1FezLTaCqbXDaMqMkABnwIvV+cdqVeXIFKop8GHW/fx4tvbiJT1Y+HdV/D2mkZWNOzh\n/U2tPL98G88v38bwIf2ZVlvBtLphjBw6wOuRfcnr1XlH6tUliBTqvXTw6EkefO5D8vPzuPP6SZSV\nFjHVIky1CCdOnmHNR62saGjmg4/2smjZVhYt28rIyACmx1fwwwb39/oQPOen1Xki9eoSRAr1XohG\nozz8QgMHDp/kpivGMXbEwL/YXtS3D9PrhjG9bhjHTpxmzaZW3q1vZt2WvfzhzS384c0tVA8r+bii\niZQVe3Qk3vHb6jyRenUJIoV6LyxZ3cj7m1qpG1XONZ+q7nTf4qICZkysZMbESo4eP8V7G2Mr+PVb\n9rF9z2aeen0zY4aXMq12GNPrKjxfpWaaX1fnHalXl6BRqPfQjubDPL5kEyXFhSy4bgL53VhZ9u9X\nyKwLhjPrguEcPnaK1RtaWNHQTP3W/WxpOsQTr25i3MiBTK8dxiW1FZSXFmXwSLLPz6vzjtSrS9Ao\n1HvgxKkz/PyZdZw+085Xrp3Uq9AtKS5k9kUjmH3RCA4ePclq18K79XtwO9r4qPEgv31lI+efV8b0\nugqmWgWDBvRN45FkV1BW54nUq0vQKNR74PFXNtK09yhzplYxuWZo2v7egf37csXFI7ni4pEcOHyC\nla6FFfV72LijjQ072nj05Q3UVpczra6CqeMjlPYPTsAHaXWeSL26BI1CvZtWuWZee38XVZESbr5y\nXMbuZ1BJEXOmVjFnahX7D51gRUMzK+r3UL9tP/Xb9vPrlzZQN7qc6bUVTLEIA/oVZmyW3gji6rwj\n9eoSJAr1bth74DgPvdBA34J8vjZ/IoUFfbJyv+WlRVw97TyunnYerQeOsbIhVtGs37KP9Vv28chL\njoljBjO9roLJNRH69/PHjzWoq/OO1KtLkPjjtz8A2tujPPjseo6eOM2XrjFGePQGoqGDirnmU9Vc\n86lqmtuOsaJ+DyvqY+fBf/DRXgr6OC4YO5hpdRVMrhlKv77Z/xGHYXWeSL26BIlCPUXPvbWVDTsP\nMNUiXH7RCK/HAaCirJhrPz2aaz89mt37jvJu/R5WNDTz3sZW3tvYSt+CfC4cN4RpdcO4cNwQigoz\n/8wiLKvzROrVJUgU6inYuLONZ5ZtYfDAIm6fW+vLgKoc3J/rZ43h+lljaGw5zIqGZt6tb2ala2Gl\na6GosA8XxS80dsHYwWmvjsK2Ou9IvboEhUK9C0eOn+IXi9YDcMe8ib59QTLRyEgJIyMlzL90DDua\nD8dfZI2F/Lv1zfTr24eLzx/KtLphTBozuNcrzzCuzjtSry5BkVKom9lCYAbQDtztnFuZsG0LsD2+\nLQp80TnXlIFZsy4ajfKrPzr2HjzB9bNGf9ytBkVeXh7Vw0qpHlbKjbPHsm3PId6tjwX88vV7WL5+\nD/2LCrh4/FCm1w2jblR5twI+7KvzROrVJSi6DHUzmw3UOOdmmlkt8B/AzIRdosA1zrljGZrRM29+\n0MTKhmZqqgYxb9Zor8fplby8PEZXDmR05UD+5opxbG46yIr6ZlY0NLNs7W6Wrd1NSXEhU8ZHmF5X\ngVWX0Sc/ecDnwuo8kXp1CYpUVupzgKcBnHMNZlZmZiXOucPx7Xnx/0Klae8RHvvTBoqLCrhj3oRO\nAy5o8vLyGDdiEONGDOLmq2rYtPMAKxqaWdnQzBtrdvHGml0M7F/IVKtgel0F51eVkZ8f+xFHo1He\nWLMrJ1bnHalXlyBIJdQrgZUJ37fGb9uUcNvPzWwM8KZz7p40zueJU6fbeeCZ9Zw81c7Xb5jA0EHh\nvXpifl4e488rY/x5ZfyXOeezYUcb7zY0s8o18+p7jbz6XiODSvoyzSqYNHYIr/9+Le9taMmJ1XlH\n6tUlCHryQmnH3+B/BP4I7AOeMbMbnXO/7/VkHvrdax+xvfkwsy8azrTaCq/HyZr8/DxqR5VTO6qc\nL372fBq2t7Gifg+rXAt/WrWTP63aCZBTq/NE6tUlCFIJ9V3EVuZnjQA+fiHUOffrs1+b2QvABUCn\noR6JlHZvyixaWb+Hl1fuoKqihG/+7RT6FXX/cc/Px9cdlcMGccW0UZw+086ajS2samimpqqMK6dW\nhXp1nuznF4lAdWUpm3YdpKx8AIUFwavkwvJvM5mwH18qUkmsxcD3gQfNbArQ6Jw7AmBmA4EngHnO\nuVPA5cCTXf2FLS2HejxwJrUdPsHCx1ZR0CePBdfWcejgMbo7aSRS6tvj643qIf2pnjU6tMd3VlfH\nVzNiINt3H2Ll2l2Bq2By/WcXdKk+YHW51HDOLQdWmdky4D7gLjO7zczmO+cOAs8Db5vZm0Czc+6p\nXsztmfZolF8+9yGHjp7ib66ooXqYHvHlk2qrywFo2L7f40lEzi2lbuEcL36uTdj2r8C/pnMoL7z0\n7nbWb93PheOG8JlLqrweR3xKvbr4XfBKwQzY0nSQ37++mUED+vKVz9WFui+W3ul4vrqI3+R8qB87\ncZoHFq3nTHuUBddNYGCAP1lIssOqyzh5qp2tTeHtbyW4cj7UH315A837jzH3U9VMHDPY63EkANSr\ni5/ldKi/vX43b63bzejKUj4/e6zX40hAJPbqIn6Ts6He3HaMR15yFPXtw53zJ+paHpIy9eriZzmZ\nZKfPxC4DcPzkGW69ejz
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f4a3278>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Probability survival\n",
"df.groupby('Parch').Survived.mean().plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the probability of surviving is higher in 2 and 3. Sincethere were too few rows for Parch >= 3, this part is not relevant."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f4fbe10>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2f8f3c1240>], dtype=object)"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEBCAYAAAB4wNK4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFnFJREFUeJzt3X+QXWWd5/F3g0KbkF4c5hqyTcE4pfu1LGtmFtlhN5ki\nIQiupchuAeXWZFklYy0zisUAOhtrJxKzU7uODJSrDuVsMAoUbgFTPWgMYHR0mYlmNliOMPzhdzUy\nwSRILkjoJMUlsdP7xzkZ2+aGPt333B9Nv19VXbn33Ofc79P33qc/Off8eIYmJyeRJOmkfndAkjQY\nDARJEmAgSJJKBoIkCTAQJEklA0GSBMCr+t0BvbyIWA3cDCwGdgNXZ+a+/vZK6o+IeBXwp8D1wFmO\nhXq5hTDAImIR8L+BtZn5JuCrwF/0t1dSX30ZGAc8gaoLDITBthrYlZmPlvc3A5dExOI+9knqp42Z\n+XFgqN8deSUyEAbbvwB2Hb+TmYeBZ4E39K1HUh9l5v/tdx9eyQyEwbYIaE1b9gLF/gRJqpWBMNgO\nA8PTli0CDvWhL5Je4QyEwfYD4I3H70TEPwNOB37Ytx5JesUyEAbbt4CzI2J5ef964KuZ+UIf+yTp\nFWrIy18Ptoi4APg0xVdFPwLel5n7+9srqfci4nXAw+Xd4wdc/By4KDOf6lvHXkEqBUJEDAOPAxuB\nC4G3As+UD9+cmQ9GxBrgOmAC2JSZm7vTZUlSN1Q9U3k9xeGOUJwQsi4zHzj+YHkC1XrgPIrEfiQi\nxjLzQJ2dlSR1z4z7ECIigDcBWylOBjn+M9X5wM7MPJSZLWA7sKLmvkqSuqjKFsItwAeB9/GL08U/\nGBE3AE8DHwLOBJpT1mkCy+rrpiSp2152CyEirgK+k5m7y0VDwJ0UXxldBHwf2NBmVU8rl6R5ZqYt\nhHcCr4+IS4GzKM6avSYzHysf3wLcBtwHXDplvVFgx0zFJycnJ4eGzA7Vbt59qBwL6qLKH6zKh51G\nxMeAf6T4w/9HmflERPwB8GbgI8BjFDuVjwHfBf5VZh6c4Wknm82ZmnSm0ViCNRZcjfn4l7XrY6Gd\nXrwfg1Z7AdatPB7mMh/CZ4F7IuIwxSUUrs7MVkSsA7ZRBMKGCmHATZ/4HBOTr551B44dm+Cyi/8N\nv/76X5v1upKk9ioHQmZunHL3t9s8PgaMzab4d398hJOWzH7f88TRFr/1kz0GgiTVyEtXSJIAA0GS\nVDIQJEmAgSBJKhkIkiTAQJAklQwESRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElSyUCQ\nJAEV50OIiGHgcWAj8E3gLooweQq4KjOPRsQa4DpgAtiUmZu702VJUjdU3UJYDzxb3t4IfCYzVwK7\ngLURsahssxq4ELg+Ik6vu7OSpO6ZMRAiIoA3AVspJmteCWwpH94CXAycD+zMzEOZ2QK2Ayu60mNJ\nUldU2UK4BbiBIgwAFmfm0fL2fmAZsBRoTlmnWS6XJM0TLxsIEXEV8J3M3H2CJkOzXC5JGlAz7VR+\nJ/D6iLgUGAWOAIci4tTMfLFcthfYxy9vEYwCO7rQ338yMjJMo7GkUtuq7TphjcGqMR/163Xp5/ux\n0H7nQf/sv2wgZOZ/OH47Ij4G/COwHLgCuBu4HHgI2AncHhEjwLGyzXXd6XJhfLxFs3lwxnaNxpJK\n7TphjcGrMR91+3Vppxfvx6DVXoh1q5rNeQjHvwa6CXhvRDwMvBa4o9yRvA7YVv5syMz+fMokSXNS\n6TwEgMz8+JS7l7R5fAwYq6NTkqTe80xlSRJgIEiSSgaCJAkwECRJJQNBkgQYCJKkkoEgSQIMBElS\nyUCQJAEGgiSpZCBIkgADQZJUMhAkSYCBIEkqGQiSJKDCfAgR8Rrgi8BS4FTgTyhmTHsr8EzZ7ObM\nfDAi1lDMlDYBbMrMzd3otCSpflUmyLkUeCQz/ywizga+DnwbWJeZDxxvFBGLgPXAecDPgUciYiwz\nD3Sh35Kkms0YCJl575S7ZwM/KW8PTWt6PrAzMw8BRMR2YAWwtYZ+SpK6rPIUmhHxbWAUeBdwI/DB\niLgBeBr4EHAm0JyyShNYVl9XJUndVHmncmauAN4N3A3cSfGV0UXA94ENbVaZvgUhSRpgVXYqnwvs\nz8w9mflYRLwK+IfMPL5DeQtwG3Afxf6G40aBHXV3+LiRkWEajSWV2lZt1wlrDFaN+ahfr0s/34+F\n9jsP+me/yldGFwDnANdHxFLgNOAvIuLDmfkEsAp4HNgJ3B4RI8AxYDnFEUddMT7eotk8OGO7RmNJ\npXadsMbg1ZiPuv26tNOL92PQai/EulVVCYTPAZ+PiL8BhoEPAIeAeyLicHn76sxsRcQ6YBtFIGzI\nzP580iRJs1blKKMWsKbNQ7/dpu0YMFZDvyRJPeaZypIkwECQJJUMBEkSYCBIkkoGgiQJMBAkSSUD\nQZIEGAiSpJKBIEkCDARJUslAkCQBBoIkqWQgSJIAA0GSVDIQJElAtSk0XwN8EVgKnAr8CfAocBdF\noDwFXJWZRyNiDcUsaRPApszc3KV+S5JqVmUL4VLgkcxcBbwHuBXYCHw2M1cCu4C1EbEIWA+sBi6k\nmHLz9K70WpJUuyozpt075e7ZwE+AlcA15bItwIeB/wfszMxDABGxHVgBbK2zw5Kk7qgypzIAEfFt\nYJRii+HrmXm0fGg/sIziK6XmlFWa5XJJ0jxQeadyZq4A3g3cDQxNeWio/RonXC5JGkBVdiqfC+zP\nzD2Z+VhEnAwcjIhTM/NFiq2GvcA+fnmLYBTY0Y1OA4yMDNNoLKnUtmq7TlhjsGrMR/16Xfr5fiy0\n33nQP/tVvjK6ADiHYifxUuA04EHgCoqthcuBh4CdwO0RMQIcA5ZTHHHUFePjLZrNgzO2azSWVGrX\nCWsMXo35qNuvSzu9eD8GrfZCrFtVla+MPge8LiL+hmIH8h8ANwHvjYiHgdcCd2RmC1gHbCt/NmRm\nfz5pkqRZq3KUUQtY0+ahS9q0HQPGauiXJKnHPFNZkgQYCJKkkoEgSQIMBElSyUCQJAEGgiSpZCBI\nkgADQZJUMhAkSYCBIEkqGQiSJMBAkCSVDARJEmAgSJJKBoIkCag2YxoR8Ungd4CTgU9QzK38VuCZ\nssnNmflgRKyhmCVtAtiUmZvr77IkqRuqzKm8CnhzZi6PiF8B/h74a2BdZj4wpd0iYD1wHvBz4JGI\nGMvMA13puSSpVlW+MnoYuLK8fQBYTLGlMDSt3fnAzsw8VM6yth1YUVdHJUndVWUKzUnghfLu+4Gt\nFF8JXRsRNwBPAx8CzgSaU1ZtAstq7a0kqWsq71SOiMuAq4FrgbuA/5KZFwHfBza0WWX6FoQkaYBV\n3an8duCjwNsz8yDwrSkPbwFuA+4DLp2yfBTYUVM/X2JkZJhGY0mltlXbdcIag1VjPurX69LP92Oh\n/c6D/tmvslN5BPgkcFFmPl8u+0vgI5n5BLAKeBzYCdxetj8GLKc44qgrxsdbNJsHZ2zXaCyp1K4T\n1hi8GvNRt1+Xdnrxfgxa7YVYt6oqWwjvAc4A7o2IIWAS+AJwT0QcBg4BV2dmKyLWAdsoAmFDuTUh\nSZoHquxU3gRsavPQXW3ajgFjNfRLktRjnqksSQIMBElSyUCQJAEGgiSpVOk8BEndteFPP8eTzaNz\nWveNo6fx/v94Rc090kJkIEgD4PnWSTQnz57TusteaM7cSKrAr4wkSYCBIEkqGQiSJMBAkCSVDARJ\nEmAgSJJKBoIkCTAQJEklA0GSBFSfQvOTwO8AJwOfAB6hmA/hJOAp4KrMPBoRayhmSZsANmXm5q70\nWpJUuxm3ECJiFfDmzFwOvAP4FLAR+GxmrgR2AWsjYhGwHlgNXAhcHxGnd6vjkqR6VfnK6GHgyvL2\nAWAxsBL4SrlsC3AxcD6wMzMPZWYL2A6sqLe7kqRuqTKF5iTwQnn394CtwNsz8/ilGfcDy4ClwNSr\nbDXL5ZKkeaDy1U4j4jJ
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f47c390>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(column='Parch', by='Survived', sharey=True)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Parch</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">1</th>\n",
" <th rowspan=\"3\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.984375</td>\n",
" <td>0.484375</td>\n",
" <td>64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.000000</td>\n",
" <td>0.411765</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.846154</td>\n",
" <td>1.076923</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.363636</td>\n",
" <td>0.262626</td>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.285714</td>\n",
" <td>0.357143</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.625000</td>\n",
" <td>0.750000</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">2</th>\n",
" <th rowspan=\"4\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.888889</td>\n",
" <td>0.333333</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.944444</td>\n",
" <td>0.722222</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.000000</td>\n",
" <td>0.545455</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.000000</td>\n",
" <td>1.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.089888</td>\n",
" <td>0.224719</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.500000</td>\n",
" <td>1.071429</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.400000</td>\n",
" <td>0.400000</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"13\" valign=\"top\">3</th>\n",
" <th rowspan=\"7\" valign=\"top\">female</th>\n",
" <th>0</th>\n",
" <td>0.588235</td>\n",
" <td>0.341176</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.480000</td>\n",
" <td>1.240000</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.320000</td>\n",
" <td>2.560000</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.500000</td>\n",
" <td>0.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>0.500000</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.250000</td>\n",
" <td>0.500000</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"6\" valign=\"top\">male</th>\n",
" <th>0</th>\n",
" <td>0.121622</td>\n",
" <td>0.135135</td>\n",
" <td>296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.266667</td>\n",
" <td>1.900000</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.166667</td>\n",
" <td>4.055556</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived SibSp Parch\n",
"Pclass Sex Parch \n",
"1 female 0 0.984375 0.484375 64\n",
" 1 1.000000 0.411765 17\n",
" 2 0.846154 1.076923 13\n",
" male 0 0.363636 0.262626 99\n",
" 1 0.285714 0.357143 14\n",
" 2 0.625000 0.750000 8\n",
" 4 0.000000 1.000000 1\n",
"2 female 0 0.888889 0.333333 45\n",
" 1 0.944444 0.722222 18\n",
" 2 1.000000 0.545455 11\n",
" 3 1.000000 1.500000 2\n",
" male 0 0.089888 0.224719 89\n",
" 1 0.500000 1.071429 14\n",
" 2 0.400000 0.400000 5\n",
"3 female 0 0.588235 0.341176 85\n",
" 1 0.480000 1.240000 25\n",
" 2 0.320000 2.560000 25\n",
" 3 0.500000 0.500000 2\n",
" 4 0.000000 0.500000 2\n",
" 5 0.250000 0.500000 4\n",
" 6 0.000000 1.000000 1\n",
" male 0 0.121622 0.135135 296\n",
" 1 0.266667 1.900000 30\n",
" 2 0.166667 4.055556 18\n",
" 3 0.000000 1.000000 1\n",
" 4 0.000000 1.000000 1\n",
" 5 0.000000 1.000000 1"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Pclass', 'Sex', 'Parch'])['Parch', 'SibSp', 'Survived'].agg({'Parch': np.size, 'SibSp': np.mean, 'Survived': np.mean})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that Parch has an important impact for men in first and second class. We are going to check the age."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Survived 0.439024\n",
"Age 27.871951\n",
"dtype: float64"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('(Sex == \"male\") and (Pclass == [1, 2]) and (Parch == [1, 2])')[['Survived', 'Age']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that in those cases, the age is 27. We can compare with the rest of men if first and second class."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Survived 0.269565\n",
"Age 36.063750\n",
"dtype: float64"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('(Sex == \"male\") and (Pclass == [1, 2])')[['Survived', 'Age']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that there is a significant difference, so we suspect that this feature has impact of men in first and second class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recap: Filling null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Age: null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We fill null values of Age with its median."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 29.361582\n",
"std 13.019697\n",
"min 0.420000\n",
"25% 22.000000\n",
"50% 28.000000\n",
"75% 35.000000\n",
"max 80.000000\n",
"Name: AgeFilled, dtype: float64"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We create a new feature to maintain the original \n",
"df['AgeFilled'] = df['Age'].fillna(df['Age'].median())\n",
"df['AgeFilled'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f360ba8>"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEDCAYAAAD6CoU1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEXpJREFUeJzt3X+QXWV9x/H3koVgku2wMLcxbmu0aL+UdjpDtGiBkpCG\npE5FWxIGgUnR6Mi06NRWLfqHoqnTiTr4Axx/DkhIBfxBS92xQxACOiNWaaGibfM1RhuURLLa1S5G\n0g3Z/nFvdDfJ3h+bPbn7cN+vGebcnHPuOd87A588PM95ntM3MTGBJKksJ3S7AElS5wxvSSqQ4S1J\nBTK8JalAhrckFcjwlqQC9bc6ISIWArcAg8BJwEbgh8BHgIPAI5l5dZVFSpKmaqfl/Upge2auBNYB\nHwTeD7w+M/8AOCUi1lRXoiTpcO2E94+A0xqfTwN+DDw3Mx9q7BsGVlVQmyRpGi3DOzM/DSyNiB3A\n/cCbgdFJp+wFllRSnSTpqFqGd0RcAezKzOcDK4G/P+yUvioKkyRNr+WAJXAusBUgM78ZEc847HtD\nwO5mFzhw4KmJ/v55My5SqsrF1wwzfuAgJ/WfwB3vvqjb5UiHm7Zx3E54fwd4MfCPEbEUGAO+FxHn\nZuZXgIuB65tdYHR0Xwe1SsfPBWcNcf/Dj7HirCFGRsa6XY40Ra02MO2xvlarCjYeFbwJWAzMA95G\n/VHBj1P/W+FrmfmmZtcYGRlz6ULNWbXagMGtOalWG5i25d0yvGeD4a25zPDWXNUsvJ1hKUkFMrwl\nqUCGtyQVyPCWpAIZ3pJUIMNbkgpkeEtSgQxvSSqQ4S1JBTK8JalAhrckFcjwVk+7/d4dXHzNMLff\nu6PbpUgdMbzV07Y99BjjBw5y38OPdbsUqSOGt3raymVDnNR/AhecNdTtUqSOuCSsep5LwmqucklY\nSXqaMbwlqUCGtyQVyPCWpAK1fHt8RGwA1gMT1F84/ALgPOAjwEHgkcy8usoiJUlTdfS0SUScD1wC\n/Dbwpsx8KCI+BdySmVun+55Pm2iueucnv86ux59g6eJFXPuqs7tdjjTFbD5t8nbg3cBzMvOhxr5h\nYNUMa5O6atfjT0zZSqVoO7wj4oXAo8BTwOikQ3uBJbNclySpiU5a3q8Bbm58ntyUn7ZZL811AwtO\nnLKVStFywHKSFcDrGp9Pm7R/CNjd7IuDgwvo75/XWWXScTC2b/wX21ptoMvVSO1rK7wjYgkwlpkH\nGn/+r4g4JzMfAC4Grm/2/dHRfcdcqFSFpYsX/WLA0inymmuaNSjabXkvod63fchfAR+LiD7ga5m5\nbeblSd1z7avOdm0TFcmFqdTzDG/NVS5MJUlPM4a3JBXI8JakAhneklQgw1uSCmR4S1KBDG/1tNvv\n3cHF1wxz+707ul2K1BHDWz1t20OPMX7gIPc9/Fi3S5E6Ynirp/W5rJoKZXirp40fODhlK5XC8FZP\nc0lYlcrwVk87dWD+lK1UCsNbPc3XoKlUhrd62tLFi6ZspVIY3pJUIMNbPc1uE5XK8FZPs9tEpTK8\n1dN2/3jflK1UCsNbPc1JOipVu2+PvwJ4MzAOvB34JrCFevjvAdZn5nhVRUpVObH/BMYPHOTEftsx\nKkvLf2Mj4lTqgX0O8FLgT4CNwA2ZuRzYCWyoskipKiefNG/KVipFO82NVcAXM3NfZj6emVcBK4Dh\nxvHhxjlSccb2jU/ZSqVop9vkOcDCiPgn4BTgncCCSd0ke4El1ZQnSTqadsK7DzgV+FPqQX5fY9/k\n400NDi6gv9//LdXcVqsNdLsEqW3thPfjwAOZeRD4bkSMAeMRMT8z9wNDwO5mFxgd9TEszU2TByxH\nRsa6XY40RbMGRTt93ncDKyOiLyJOAxYB9wDrGsfXAncda5FSNzzrtAVTtlIpWoZ3Zu4GPgf8C/AF\n4GrgWuDKiPgSMAhsrrJIqSpOj1ep2nrOOzM/AXzisN2rZ78c6fhaungRux5/wunxKo4zE9TTvr/3\niSlbqRSGt3rawYmpW6kUhrd62gl9U7dSKQxv9bRf/9VFU7ZSKQxv9TSfNlGpDG/1NF/GoFIZ3upp\nvoxBpTK81dN8GYNKZXhLUoEMb0kqkOGtnuaApUrVNzFR/dSykZEx569pzqrVBlwOVnNSrTYw7fSx\nthamkkqxdu1F7Ny5o9J7nH7687njjuHWJ0oVsuWtnrdh0zZuesvKbpchHaFZy9s+b0kqkOEtSQUy\nvCWpQIa3et5lq6PbJUgdM7zV8y5fc0a3S5A61vJRwYhYDnwW+BbQBzwCvBfYQj389wDrM3O8wjol\nSZO02/K+PzNXZuYFmfmXwEbghsxcDuwENlRWoSTpCO2G9+HPGq4ADs1SGAZWzVZBkqTW2p1heWZE\n3AmcSr3VvWBSN8leYEkVxUmSjq6d8N4BvCMzPxsRvwHcd9j3Wr66dXBwAf3982ZYolStW7dud9BS\nxWkZ3pm5m/qAJZn53Yj4IfDCiJifmfuBIWB3s2uMjvqWEs1dt92dXLhsqNtlSEeo1QamPdayzzsi\nLo+INzY+PxNYDHwSWNc4ZS1w17GXKUlqVzvdJp8Hbo2IlwMnAlcB3wBuiYjXAruAzdWVKEk6XDvd\nJk8ALzvKodWzX44kqR3OsJSkAhne6nmubaISGd7qeT4mqBIZ3pJUIMNbkgpkeEtSgQxvSSqQ4a2e\nd+vW7d0uQeqY4a2ed9vd2e0SpI4Z3pJUIMNbkgpkeEtSgQxvSSqQ4a2e59omKpHhrZ7n2iYqkeEt\nSQUyvCWpQIa3JBXI8JakArXzAmIi4mTgW8BGYBuwhXrw7wHWZ+Z4ZRVKFbt163YuXDbU7TKkjrTb\n8n4b8OPG543ADZm5HNgJbKiiMOl4cW0TlahleEdEAGcAXwD6gOXAcOPwMLCqsuokSUfVTsv7OuCv\nqQc3wMJJ3SR7gSVVFCZJml7TPu+IWA88kJm76g3wI/QdbefhBgcX0N8/bwblScdHrTbQ7RKkjrQa\nsPxj4LkRcREwBPwf8EREzM/M/Y19u1vdZHR03zEXKlVpZGSs2yVIR2jWqGga3pn5ikOfI+LtwH8D\n5wDrgE8Ba4G7ZqNIqVtc20Ql6uQ570NdJNcCV0bEl4BBYPOsVyUdR65tohL1TUxMVH6TkZGx6m8i\nzVCtNmC3ieakWm1g2nFFZ1hKUoEMb0kqkOEtSQUyvNXzbt26vdslSB0zvNXzXNtEJTK8JalAhrck\nFcjwlqQCGd6SVCBnWGpOe/0HvszPnjzQ7TKO2cKT+7nhDed3uwwVptkMy7ZegyZ1y8+ePMBNb1lZ\n6T2Ox/T4DZu2VXp99R67TSSpQIa3JBXI8JakAhneklQgw1uSCmR4S1KBDG9JKpDhLUkFajlJJyKe\nAdwMLAbmA+8CvgFsoR7+e4D1mTleXZmSpMnaaXlfBDyYmSuAS4H3ARuBD2XmcmAnsKGyCiVJR2jZ\n8s7Mz0z647OB7wPLgasa+4aBNwIfm/XqJElH1fbaJhHxFWCIekv8i5O6SfYCSyqoTZI0jbbDOzPP\njYjfBT4FTF7patpVrw4ZHFxAf/+8GZQn1ReO8h7SVO0MWC4D9mbmDzLzkYiYB4xFxPzM3E+9Nb67\n2TVGR/fNTrXqSVWv+Hc8VhWE6n+Hnn6a/YXfzoDl+dT7tImIxcAi4B5gXeP4WuCuYytRktSJdrpN\nPgrcGBFfBk4G/hz4N2BLRLwW2AVsrq5ESdLh2nna5EngiqMcWj375UiS2uEMS0kqkK9B05z26kc/\nz7dfc0ul9/h2pVeve/VJpwDVvs5NvcXw1px247Nf9rR4h+WmTds4t9I7qNfYbSJJBTK8JalAhrck\nFcjwlqQCGd6SVCDDW5IKZHhLUoEMb0kqkOEtSQUyvCWpQIa3JBXI8JakAhneklQgw1uSCuSSsJrz\nNmza1u0SjtnCk/1PTbOrb2JiovKbjIyMVX8TaYY2bNpW+Zrh0kzUagN90x1rqzkQEe8BzgPmAZuA\nB4Et1Ltd9gDrM3P82Eu
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f9461d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Bug: if you include Seaborn, add 'sym='k.' to show the outliers\n",
"df.boxplot(column='AgeFilled', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another alternative is to use the function interpolate()."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 29.726061\n",
"std 13.902353\n",
"min 0.420000\n",
"25% 21.000000\n",
"50% 28.500000\n",
"75% 38.000000\n",
"max 80.000000\n",
"Name: AgeFilled, dtype: float64"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['AgeFilled'] = df['Age'].interpolate()\n",
"df['AgeFilled'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f2f8f2c04a8>"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEDCAYAAAD6CoU1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEQ9JREFUeJzt3X+QXWV9x/H3sivBhO2wMHdi3I7Rov1S2ukM0aIDlIQ0\nkHYq2pJYJEyqpFamFae0atE/lBptB3W0Fhx/dVCQiqhNS7tjh/AjIDPiIBYq2jbfxmhjTSJZ7WJX\nozHJbv+4N7rZZO+PzZ69++S+XzPMvXvuj/PZmcxnH55zznP6JicnkSSV5ZRuB5Akdc7ylqQCWd6S\nVCDLW5IKZHlLUoEsb0kq0ECrN0TEEuATwBBwKrAZ+A7wIWACeDIzX1dlSEnS0doZeb8a2J6Zq4H1\nwN8Afw28PjN/HTgjItZWF1GSNF075f1d4KzG87OA7wHPy8zHG9tGgDUVZJMkzaBleWfmp4HlEbED\neAh4EzA25S37gGWVpJMkHVfL8o6Iq4FdmfkCYDXwd9Pe0ldFMEnSzFoesAQuBLYCZOZXI+KZ0z43\nDOxp9gWHDh2eHBjon3VIqSpX3DDCwUMTnDpwClvedXm340jTzTg4bqe8vw68BPjHiFgOjAPfjIgL\nM/MLwBXAzc2+YGxsfwdZpflzyXnDPPTEbladN8zo6Hi340hHqdUGZ3ytr9Wqgo1TBT8GLAX6gbdS\nP1Xwo9T/KjyamW9s9h2jo+MuXagFq1YbtLi1INVqgzOOvFuW91ywvLWQWd5aqJqVt1dYSlKBLG9J\nKpDlLUkFsrwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ5S1JBbK81dPuemAHV9wwwl0P\n7Oh2FKkjlrd62rbHd3Pw0AQPPrG721Gkjlje6mmrVwxz6sApXHLecLejSB1xSVj1PJeE1ULlkrCS\ndJKxvCWpQJa3JBXI8pakArW8e3xEbAI2ApPUbzj8QuAi4EPABPBkZr6uypCSpKN1dLZJRFwMvAL4\nZeCNmfl4RHwS+ERmbp3pc55tooXq7R//Erue+gHLl57Ojdec3+040lHm8myTtwHvAp6bmY83to0A\na2aZTeqqXU/94KhHqRRtl3dEvAj4FnAYGJvy0j5g2RznkiQ10cnI+zXAbY3nU4fyMw7rpYVucPEz\njnqUStHygOUUq4DrGs/PmrJ9GNjT7INDQ4sZGOjvLJk0D8b3H/zpY6022OU0UvvaKu+IWAaMZ+ah\nxs//GREXZOYjwBXAzc0+Pza2/4SDSlVYvvT0nx6w9BJ5LTTNBhTtjryXUZ/bPuJPgY9ERB/waGZu\nm308qXtuvOZ81zZRkVyYSj3P8tZC5cJUknSSsbwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeaun3fXA\nDq64YYS7HtjR7ShSRyxv9bRtj+/m4KEJHnxid7ejSB2xvNXTJiYmADh8eKLLSaTOWN7qaROTRz9K\npbC81dOeMXDKUY9SKfwXq5727LMWH/UolcLyVk/zNmgqleWtnrZ86elHPUqlcElY9TyXhNVC1WxJ\n2E5ugyYteOvWXc7OndVecHP22S9gy5aRSvchteLIWz3PkbcWKm/GIEknGctbkgrU7t3jrwbeBBwE\n3gZ8FbiDevnvBTZm5sGqQkpVunPrdi5dMdztGFJHWo68I+JM6oV9AfBS4HeAzcAtmbkS2AlsqjKk\nVKVP3ZvdjiB1rJ1pkzXAfZm5PzOfysxrgVXAkcPtI433SJLmSTvTJs8FlkTEPwFnAG8HFk+ZJtkH\nLKsmniTpeNop7z7gTOB3qRf5g41tU19vamhoMQMD/bPJJ82LWm2w2xGkjrRT3k8Bj2TmBPCNiBgH\nDkbEosw8AAwDe5p9wdjY/hNPKlXI87y1EDUbVLQz530vsDoi+iLiLOB04H5gfeP1dcA9JxpS6par\nLotuR5A61tYVlhHxh8BrgEngHcCXqZ8quAjYBVyTmYdn+rxXWGoh8wpLLVTNrrD08nj1PMtbC5WX\nx0vSScbylqQCWd6SVCDLWz3vzq3bux1B6pjlrZ7n2iYqkeUtSQWyvCWpQJa3JBXI8pakAlne6nmu\nbaISWd7qeRvWntPtCFLHLG9JKpDlLUkFsrwlqUCWtyQVyPJWz3NtE5XI8lbPc20TlcjylqQCWd6S\nVKCBVm+IiJXAZ4GvAX3Ak8B7qN+A+BRgL7AxMw9WmFOSNEW7I++HMnN1Zl6SmX8CbAZuycyVwE5g\nU2UJJUnHaLe8p9/BeBUw0ng+AqyZq0DSfHNtE5Wo5bRJw7kRcTdwJvVR9+Ip0yT7gGVVhJPmw4a1\n5zA6Ot7tGFJH2invHcBfZOZnI+IXgAenfW76qPwYQ0OLGRjon2VEqXq12mC3I0gd6ZucnOzoAxHx\nKPAi6qPvAxFxMXBdZv7eTJ8ZHR3vbCfSPKrVBh15a0Gq1QZnHBy3nPOOiA0R8YbG82cBS4GPA+sb\nb1kH3DMHOSVJbWpn2uSfgTsj4uXAM4Brga8An4iI1wK7gNuriyhJmq7jaZPZcNpEC9l9j+/m0hXD\n3Y4hHeOEpk2kk51rm6hElrckFcjylqQCWd6SVCDLW5IKZHmr57m2iUpkeavnbVh7TrcjSB2zvCWp\nQJa3JBXI8pakAlneklQgy1s9786t27sdQeqY5a2e59omKpHlLUkFsrwlqUCWtyQVyPKWpAJZ3up5\nrm2iElne6nmubaIStXMDYiLiNOBrwGZgG3AH9eLfC2zMzIOVJZQkHaPdkfdbge81nm8GbsnMlcBO\nYFMVwSRJM2tZ3hERwDnA54A+YCUw0nh5BFhTWTpJ0nG1M/J+L/Bn1IsbYMmUaZJ9wLIqgkmSZtZ0\nzjsiNgKPZOau+gD8GH3H2zjd0NBiBgb6ZxFPqt6dW7d70FLFaXXA8reB50XE5cAw8BPgBxGxKDMP\nNLbtabWTsbH9JxxUqsqn7k0uXTHc7RjSMWq1wRlfa1remfnKI88j4m3AfwMXAOuBTwLrgHvmIqQk\nqX2dnOd9ZIrkRuBVEfF5YAi4fc5TSZKaaus8b4DMfPuUHy+rIIskqU1eYSlJBeqbnJysfCejo+PV\n70Qnpde//2F++OND3Y5xwpacNsAt11/c7RgqTK02OOMZfW1Pm0jd8MMfH+Jjb15d6T5qtUFGR8cr\n3cemm7ZV+v3qPU6bSFKBLG9JKpDlLUkFsrwlqUCWtyQVyPKWpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ\n5S1JBbK8JalAlrckFcjylqQCWd6SVCDLW5IK1PJmDBHxTOA2YCmwCHgn8BXgDurlvxfYmJkHq4sp\nSZqqnZH35cBjmbkKuBJ4H7AZ+EBmrgR2ApsqSyhJOkbLkXdmfmbKj88B/gdYCVzb2DYCvAH4yJyn\nkyQdV9v3sIyILwDD1Efi902ZJtkHLKsgmyRpBm2Xd2ZeGBG/CnwSmHpH4xnvbnzE0NBiBgb6ZxFP\nqt8g2H1IR2vngOUKYF9mfjszn4yIfmA8IhZl5gHqo/E9zb5jbGz/3KRVT6r6zu7zcfd4qP730Mmn\n2R/8dg5YXkx9TpuIWAqcDtwPrG+8vg6458QiSpI60c60yYeBWyPiYeA04I+AfwXuiIjXAruA26uL\nKEmarm9ycrLynYyOjle/E52UvvDH11P7ydPdjnHCRk89gws/+P5ux1BharXBGY8ptn3AUuqGW5/z\nMj725tWV7mM+5rxvumkbF1a6B/UaL4+XpAJZ3pJUIMtbkgpkeUtSgSxvSSqQ5S1JBbK8JalAlrck\nFcjylqQCWd6SVCDLW5IKZHlLUoEsb0kqkOUtSQWyvCWpQJa3JBXI8pakAlneklSgtm6DFhHvBi4C\n+oGbgMeAO6iX/15gY2YerCqkJOloLcs7IlYB52bmBRFxJvAE8ADwgczcEhF/CWwCPlJpUvWsTTdt\n63aEE7bkNG8Xq7nVzr+ozwOPNp4/DSwBVgLXNraNAG/A8lYFqr75MNT/OMzHfqS51LK8M3MS+FHj\nxz8APgesnTJNsg9YVk0
"text/plain": [
"<matplotlib.figure.Figure at 0x7f2f8f299dd8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Bug: if you include Seaborn, add 'sym='k.' to show the outliers\n",
"df.boxplot(column='AgeFilled', return_type='axes', sym='k.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Embarking: null values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see most passengers are in 'S'. There were also missing values."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Embarked'].isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we discussed previously, we will replace these missing values by the most popular one (mode): S."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Replace nulls with the most common value\n",
"df['Embarked'].fillna('S', inplace=True)\n",
"df['Embarked'].isnull().any()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Cabin: null values"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"We are going to analyse Cabin in the exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encoding categorical features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recap: encoding categorical features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous notebook we saw how to encode categorical features. We are going to explore an alternative way."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#df = df_original.copy()\n",
"#df['SexEncoded'] = df.Sex\n",
"#\n",
"#df.loc[df[\"SexEncoded\"] == 'male', \"SexEncoded\"] = 0\n",
"#df.loc[df[\"SexEncoded\"] == \"female\", \"SexEncoded\"] = 1\n",
"#\n",
"#df['EmbarkedEncoded'] = df.Embarked\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"S\", \"EmbarkedEncoded\"] = 0\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"C\", \"EmbarkedEncoded\"] = 1\n",
"#df.loc[df[\"EmbarkedEncoded\"] == \"Q\", \"EmbarkedEncoded\"] = 2\n",
"#df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encoding Categorical Variables as Binary ones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we see previously, translating categorical variables into integer can introduce an order. In our case, this is not a problem, since *Sex* is a binary variable, and we can consider there exists an order in *Pclass*.\n",
"\n",
"Nevertheless, we are going to introduce a general approach to encode categorical variables using some facilities provided by scikit-learn."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**LabelEncoder** transform categories into integers (0, 1, ...). We are going to use it for *Sex*."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" <th>SexCoded</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked SexCoded \n",
"0 0 A/5 21171 7.2500 NaN S 1 \n",
"1 0 PC 17599 71.2833 C85 C 0 \n",
"2 0 STON/O2. 3101282 7.9250 NaN S 0 \n",
"3 0 113803 53.1000 C123 S 0 \n",
"4 0 373450 8.0500 NaN S 1 "
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
"\n",
"df = df_original.copy() # take original df\n",
"\n",
"# We define here the categorical columns have non integer values, so we need to convert them\n",
"# into integers first with LabelEncoder. This can be omitted if the are already integers.\n",
"\n",
"label_enc = LabelEncoder()\n",
"label_sex = label_enc.fit_transform(df['Sex'])\n",
"df['SexCoded'] = label_sex\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we see it has been easy and we have *Sex* as a binary variable.\n",
"\n",
"Now we are going to do the same with *Embarked* and *Pclass*. There are several alternatives in scikit-learn, such as *DictVectorizer* or *OneHotEncoder*.\n",
"\n",
"We are going to use *pd.get_dummies*, which provides a very easy-to-use way to encode categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>SexCoded</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" <th>Pclass_1</th>\n",
" <th>Pclass_2</th>\n",
" <th>Pclass_3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Name \\\n",
"0 1 0 Braund, Mr. Owen Harris \n",
"1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... \n",
"2 3 1 Heikkinen, Miss. Laina \n",
"3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) \n",
"4 5 0 Allen, Mr. William Henry \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin SexCoded \\\n",
"0 male 22.0 1 0 A/5 21171 7.2500 NaN 1 \n",
"1 female 38.0 1 0 PC 17599 71.2833 C85 0 \n",
"2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN 0 \n",
"3 female 35.0 1 0 113803 53.1000 C123 0 \n",
"4 male 35.0 0 0 373450 8.0500 NaN 1 \n",
"\n",
" Embarked_C Embarked_Q Embarked_S Pclass_1 Pclass_2 Pclass_3 \n",
"0 0.0 0.0 1.0 0.0 0.0 1.0 \n",
"1 1.0 0.0 0.0 1.0 0.0 0.0 \n",
"2 0.0 0.0 1.0 0.0 0.0 1.0 \n",
"3 0.0 0.0 1.0 1.0 0.0 0.0 \n",
"4 0.0 0.0 1.0 0.0 0.0 1.0 "
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Remove nulls\n",
"df['Embarked'].fillna('S', inplace=True)\n",
"df = pd.get_dummies(df, columns=['Embarked', 'Pclass'])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cleaning: dropping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We should drop columns we will not use. In the exercise, you will need to use 'Cabin'."
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>SexCoded</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" <th>Pclass_1</th>\n",
" <th>Pclass_2</th>\n",
" <th>Pclass_3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Name \\\n",
"0 1 0 Braund, Mr. Owen Harris \n",
"1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... \n",
"2 3 1 Heikkinen, Miss. Laina \n",
"3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) \n",
"4 5 0 Allen, Mr. William Henry \n",
"\n",
" Sex Age SibSp Parch Fare SexCoded Embarked_C Embarked_Q \\\n",
"0 male 22.0 1 0 7.2500 1 0.0 0.0 \n",
"1 female 38.0 1 0 71.2833 0 1.0 0.0 \n",
"2 female 26.0 0 0 7.9250 0 0.0 0.0 \n",
"3 female 35.0 1 0 53.1000 0 0.0 0.0 \n",
"4 male 35.0 0 0 8.0500 1 0.0 0.0 \n",
"\n",
" Embarked_S Pclass_1 Pclass_2 Pclass_3 \n",
"0 1.0 0.0 0.0 1.0 \n",
"1 0.0 1.0 0.0 0.0 \n",
"2 1.0 0.0 0.0 1.0 \n",
"3 1.0 1.0 0.0 0.0 \n",
"4 1.0 0.0 0.0 1.0 "
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop(['Cabin', 'Ticket'], axis=1, inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Feature Engineering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
8 years ago
"Feature Engineering is the process of using domain/expert knowledge of the data to create features that make machine learning algorithms work better. We are going to define several [new ones](https://triangleinequality.wordpress.com/2013/09/08/basic-feature-engineering-with-the-titanic-data/) in the exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Basic Feature Engineering with the Titanic Data](https://triangleinequality.wordpress.com/2013/09/08/basic-feature-engineering-with-the-titanic-data/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
8 years ago
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}