mirror of
				https://github.com/gsi-upm/sitc
				synced 2025-10-30 23:18:18 +00:00 
			
		
		
		
	Not done reviewing ml2 yet
This commit is contained in:
		
							
								
								
									
										114
									
								
								ml2/3_0_0_Intro_ML_2.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										114
									
								
								ml2/3_0_0_Intro_ML_2.ipynb
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,114 @@ | ||||
| { | ||||
|  "cells": [ | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Course Notes for Learning Intelligent Systems" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Introduction to Machine Learning II\n", | ||||
|     " \n", | ||||
|     "In this lab session, we will go deeper in some aspects that were introduced in the previous session. This time we will delve into a little bit more detail about reading datasets, analysing data and selecting features. In addition, we will explore two additional machine learning algorithms: perceptron and SVM in a binary classification problem provided by the Titanic dataset.\n", | ||||
|     "\n", | ||||
|     "# Objectives\n", | ||||
|     "\n", | ||||
|     "In this lecture we are going to introduce some more details about machine learning aspects. \n", | ||||
|     "\n", | ||||
|     "The main objectives of this session are:\n", | ||||
|     "* Learn how to read data from a file or URL with pandas\n", | ||||
|     "* Learn how to use the pandas DataFrame data structure\n", | ||||
|     "* Learn how to select features\n", | ||||
|     "* Understand better the Perceptron and SVM machine learning algorithms" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Table of Contents" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "1. [Home](3_0_0_Intro_ML_2.ipynb)\n", | ||||
|     "1. [The Titanic Dataset. Reading Data](3_1_Read_Data.ipynb)\n", | ||||
|     "1. [Introduction to Pandas](3_2_Pandas.ipynb)\n", | ||||
|     "1. [Preprocessing: Data Munging with DataFrames](3_3_Data_Munging_with_Pandas.ipynb)\n", | ||||
|     "2. [Preprocessing: Visualisation and for DataFrames](3_4_Visualisation_Pandas.ipynb)\n", | ||||
|     "3. [Exercise 1](3_5_Exercise_1.ipynb)\n", | ||||
|     "1. [Machine Learning](3_6_Machine_Learning.ipynb)\n", | ||||
|     "   1. [SVM](3_7_SVM.ipynb)\n", | ||||
|     "5.  [Exercise 2](3_8_Exercise_2.ipynb)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## References" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "* [IPython Notebook Tutorial for Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic/forums/t/5105/ipython-notebook-tutorial-for-titanic-machine-learning-from-disaster)\n", | ||||
|     "* [Scikit-learn videos](http://blog.kaggle.com/author/kevin-markham/) and [notebooks](https://github.com/justmarkham/scikit-learn-videos) by Kevin Marham\n", | ||||
|     "* [Learning scikit-learn: Machine Learning in Python](http://proquest.safaribooksonline.com/book/programming/python/9781783281930/1dot-machine-learning-a-gentle-introduction/ch01s02_html), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2013.\n", | ||||
|     "* [Python Machine Learning](http://proquest.safaribooksonline.com/book/programming/python/9781783555130), Sebastian Raschka, Packt Publishing, 2015." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Licence\n", | ||||
|     "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n", | ||||
|     "\n", | ||||
|     "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   "kernelspec": { | ||||
|    "display_name": "Python 3", | ||||
|    "language": "python", | ||||
|    "name": "python3" | ||||
|   }, | ||||
|   "language_info": { | ||||
|    "codemirror_mode": { | ||||
|     "name": "ipython", | ||||
|     "version": 3 | ||||
|    }, | ||||
|    "file_extension": ".py", | ||||
|    "mimetype": "text/x-python", | ||||
|    "name": "python", | ||||
|    "nbconvert_exporter": "python", | ||||
|    "pygments_lexer": "ipython3", | ||||
|    "version": "3.5.1+" | ||||
|   } | ||||
|  }, | ||||
|  "nbformat": 4, | ||||
|  "nbformat_minor": 0 | ||||
| } | ||||
							
								
								
									
										3846
									
								
								ml2/3_1_Read_Data.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										3846
									
								
								ml2/3_1_Read_Data.ipynb
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							
							
								
								
									
										932
									
								
								ml2/3_2_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										932
									
								
								ml2/3_2_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,932 @@ | ||||
| { | ||||
|  "cells": [ | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Course Notes for Learning Intelligent Systems" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Table of Contents\n", | ||||
|     "\n", | ||||
|     "* [Introduction to Pandas](#Introduction-to-Pandas)\n", | ||||
|     "* [Series](#Series)\n", | ||||
|     "* [DataFrame](#DataFrame)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Introduction to Pandas\n" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "This notebook provides an overview of the *pandas* library. " | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "[Pandas](http://pandas.pydata.org/) is a Python library that provides easy-to-use data structures and data analysis tools.\n", | ||||
|     "\n", | ||||
|     "The main advantage of *Pandas* is that provides extensive facilities for grouping, merging and querying  pandas data structures, and also includes facilities for time series analysis, as well as i/o and visualisation facilities.\n", | ||||
|     "\n", | ||||
|     "Pandas in built on top of *NumPy*, so we will have usually to import both libraries.\n", | ||||
|     "\n", | ||||
|     "Pandas provides two main data structures:\n", | ||||
|     "* **Series** is a one dimensional labelled object, capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).. It is similar to an array, a list, a dictionary or a column in a table. Every value in a Series object has an index.\n", | ||||
|     "* **DataFrame** is a two dimensional labelled object with columns of potentially different types. It is similar to a database table, or a spreadsheet. It can be seen as a dictionary of Series that share the same index.\n" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Series" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "We are not going to use Series objects directly as frequently as DataFrames. Here we provide a short introduction" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 1, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "0     5\n", | ||||
|        "1    10\n", | ||||
|        "2    15\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 1, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "import numpy as pd\n", | ||||
|     "import pandas as pd\n", | ||||
|     "from pandas import Series, DataFrame\n", | ||||
|     "\n", | ||||
|     "# create series object from an array\n", | ||||
|     "s = Series([5, 10, 15])\n", | ||||
|     "s" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "We see each value has an associated label starting with 0 if no index is specified when the Series object is created. \n", | ||||
|     "\n", | ||||
|     "It is similar to a dictionary. In fact, we can also create a Series object from a dictionary as follows. In this case, the indexes are the keys of the dictionary." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 2, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "a     5\n", | ||||
|        "b    10\n", | ||||
|        "c    15\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 2, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "d = {'a': 5, 'b': 10, 'c': 15}\n", | ||||
|     "s = Series(d)\n", | ||||
|     "s" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 3, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Index(['a', 'b', 'c'], dtype='object')" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 3, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# We can get the list of indexes\n", | ||||
|     "s.index" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 4, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "array([ 5, 10, 15])" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 4, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# and the values\n", | ||||
|     "s.values" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Another option is to create the Series object from two lists, for  values and indexes." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 5, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "Valencia      786189\n", | ||||
|        "Sevilla       693878\n", | ||||
|        "Zaragoza      664953\n", | ||||
|        "Malaga        569130\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 5, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Series with population in 2015 of more populated cities in Spain\n", | ||||
|     "s = Series([3141991, 1604555, 786189, 693878, 664953, 569130], index=['Madrid', 'Barcelona', 'Valencia', 'Sevilla', \n", | ||||
|     "                                                                      'Zaragoza', 'Malaga'])\n", | ||||
|     "s" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 6, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "3141991" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 6, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Population of Madrid\n", | ||||
|     "s['Madrid']" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Indexing and slicing" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Until now, we have not seen any advantage in using Panda Series. we are going to show now some examples of their possibilities." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 7, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid        True\n", | ||||
|        "Barcelona     True\n", | ||||
|        "Valencia     False\n", | ||||
|        "Sevilla      False\n", | ||||
|        "Zaragoza     False\n", | ||||
|        "Malaga       False\n", | ||||
|        "dtype: bool" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 7, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "#Boolean condition\n", | ||||
|     "s > 1000000" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 8, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 8, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Cities with population greater than 1.000.000\n", | ||||
|     "s[s > 1000000]" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Observe that (s > 1000000) returns a Series object. We can use this boolean vector as a filter to get a *slice* of the original series that contains only the elements where the value of the filter is True. The original Series s is not modified. This selection is called *boolean indexing*." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 9, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 9, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Cities with population greater than the mean\n", | ||||
|     "s[s > s.mean()]" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 10, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "Valencia      786189\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 10, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Cities with population greater than the median\n", | ||||
|     "s[s > s.median()]" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 11, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid        True\n", | ||||
|        "Barcelona     True\n", | ||||
|        "Valencia      True\n", | ||||
|        "Sevilla      False\n", | ||||
|        "Zaragoza     False\n", | ||||
|        "Malaga       False\n", | ||||
|        "dtype: bool" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 11, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Check cities with a population greater than 700.000\n", | ||||
|     "s > 700000" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 12, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "Valencia      786189\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 12, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# List cities with a population greater than 700.000\n", | ||||
|     "s[s > 700000]" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 13, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid        True\n", | ||||
|        "Barcelona     True\n", | ||||
|        "Valencia      True\n", | ||||
|        "Sevilla      False\n", | ||||
|        "Zaragoza     False\n", | ||||
|        "Malaga       False\n", | ||||
|        "dtype: bool" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 13, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "#Another way to write the same boolean indexing selection\n", | ||||
|     "bigger_than_700000 = s > 700000\n", | ||||
|     "bigger_than_700000" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 14, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3141991\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "Valencia      786189\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 14, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "#Cities with population > 700000\n", | ||||
|     "s[bigger_than_700000]" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Operations on series" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "We can also carry out other mathematical operations." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 15, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       1570995.5\n", | ||||
|        "Barcelona     802277.5\n", | ||||
|        "Valencia      393094.5\n", | ||||
|        "Sevilla       346939.0\n", | ||||
|        "Zaragoza      332476.5\n", | ||||
|        "Malaga        284565.0\n", | ||||
|        "dtype: float64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 15, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Divide population by 2\n", | ||||
|     "s / 2" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 16, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "1243449.3333333333" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 16, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Get the average population\n", | ||||
|     "s.mean()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 17, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "3141991" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 17, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Get the highest population\n", | ||||
|     "s.max()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Item assignment" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "We can also change values directly or based on a condition. You can consult additional feautures in the manual." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 18, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3320000\n", | ||||
|        "Barcelona    1604555\n", | ||||
|        "Valencia      786189\n", | ||||
|        "Sevilla       693878\n", | ||||
|        "Zaragoza      664953\n", | ||||
|        "Malaga        569130\n", | ||||
|        "dtype: int64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 18, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Change population of one city\n", | ||||
|     "s['Madrid'] = 3320000\n", | ||||
|     "s" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 19, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/plain": [ | ||||
|        "Madrid       3652000.0\n", | ||||
|        "Barcelona    1765010.5\n", | ||||
|        "Valencia      864807.9\n", | ||||
|        "Sevilla       693878.0\n", | ||||
|        "Zaragoza      664953.0\n", | ||||
|        "Malaga        569130.0\n", | ||||
|        "dtype: float64" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 19, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# Increase by 10% cities with population greater than 700000\n", | ||||
|     "s[s > 700000] = 1.1 * s[s > 700000]\n", | ||||
|     "s" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# DataFrame" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "As we said previously, **DataFrames** are two-dimensional data structures. You can see like a dict of Series that share the index." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 20, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/html": [ | ||||
|        "<div>\n", | ||||
|        "<table border=\"1\" class=\"dataframe\">\n", | ||||
|        "  <thead>\n", | ||||
|        "    <tr style=\"text-align: right;\">\n", | ||||
|        "      <th></th>\n", | ||||
|        "      <th>one</th>\n", | ||||
|        "      <th>two</th>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </thead>\n", | ||||
|        "  <tbody>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>a</th>\n", | ||||
|        "      <td>1.0</td>\n", | ||||
|        "      <td>1.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>b</th>\n", | ||||
|        "      <td>2.0</td>\n", | ||||
|        "      <td>2.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>c</th>\n", | ||||
|        "      <td>3.0</td>\n", | ||||
|        "      <td>3.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>d</th>\n", | ||||
|        "      <td>NaN</td>\n", | ||||
|        "      <td>4.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </tbody>\n", | ||||
|        "</table>\n", | ||||
|        "</div>" | ||||
|       ], | ||||
|       "text/plain": [ | ||||
|        "   one  two\n", | ||||
|        "a  1.0  1.0\n", | ||||
|        "b  2.0  2.0\n", | ||||
|        "c  3.0  3.0\n", | ||||
|        "d  NaN  4.0" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 20, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# We are going to create a DataFrame from a dict of Series\n", | ||||
|     "d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),\n", | ||||
|     "    'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}\n", | ||||
|     "df = DataFrame(d)\n", | ||||
|     "df" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "In this dataframe, the *indexes* (row labels) are *a*, *b*, *c* and *d* and the *columns* (column labels) are *one* and *two*.\n", | ||||
|     "\n", | ||||
|     "We see that the resulting DataFrame is the union of indexes, and missing values are included as NaN (to write this value we will use *np.nan*).\n", | ||||
|     "\n", | ||||
|     "If we specify an index, the dictionary is filtered." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 21, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/html": [ | ||||
|        "<div>\n", | ||||
|        "<table border=\"1\" class=\"dataframe\">\n", | ||||
|        "  <thead>\n", | ||||
|        "    <tr style=\"text-align: right;\">\n", | ||||
|        "      <th></th>\n", | ||||
|        "      <th>one</th>\n", | ||||
|        "      <th>two</th>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </thead>\n", | ||||
|        "  <tbody>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>d</th>\n", | ||||
|        "      <td>NaN</td>\n", | ||||
|        "      <td>4.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>b</th>\n", | ||||
|        "      <td>2.0</td>\n", | ||||
|        "      <td>2.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>a</th>\n", | ||||
|        "      <td>1.0</td>\n", | ||||
|        "      <td>1.0</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </tbody>\n", | ||||
|        "</table>\n", | ||||
|        "</div>" | ||||
|       ], | ||||
|       "text/plain": [ | ||||
|        "   one  two\n", | ||||
|        "d  NaN  4.0\n", | ||||
|        "b  2.0  2.0\n", | ||||
|        "a  1.0  1.0" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 21, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "# We can filter\n", | ||||
|     "df = DataFrame(d, index=['d', 'b', 'a'])\n", | ||||
|     "df" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Another option is to use the constructor with *index* and *columns*." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": 22, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [ | ||||
|     { | ||||
|      "data": { | ||||
|       "text/html": [ | ||||
|        "<div>\n", | ||||
|        "<table border=\"1\" class=\"dataframe\">\n", | ||||
|        "  <thead>\n", | ||||
|        "    <tr style=\"text-align: right;\">\n", | ||||
|        "      <th></th>\n", | ||||
|        "      <th>two</th>\n", | ||||
|        "      <th>three</th>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </thead>\n", | ||||
|        "  <tbody>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>d</th>\n", | ||||
|        "      <td>4.0</td>\n", | ||||
|        "      <td>NaN</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>b</th>\n", | ||||
|        "      <td>2.0</td>\n", | ||||
|        "      <td>NaN</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "    <tr>\n", | ||||
|        "      <th>a</th>\n", | ||||
|        "      <td>1.0</td>\n", | ||||
|        "      <td>NaN</td>\n", | ||||
|        "    </tr>\n", | ||||
|        "  </tbody>\n", | ||||
|        "</table>\n", | ||||
|        "</div>" | ||||
|       ], | ||||
|       "text/plain": [ | ||||
|        "   two three\n", | ||||
|        "d  4.0   NaN\n", | ||||
|        "b  2.0   NaN\n", | ||||
|        "a  1.0   NaN" | ||||
|       ] | ||||
|      }, | ||||
|      "execution_count": 22, | ||||
|      "metadata": {}, | ||||
|      "output_type": "execute_result" | ||||
|     } | ||||
|    ], | ||||
|    "source": [ | ||||
|     "df = DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])\n", | ||||
|     "df" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "In the next notebook we are going to learn more about dataframes." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## References" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "* [Pandas](http://pandas.pydata.org/)\n", | ||||
|     "* [Learning Pandas, Michael Heydt, Packt Publishing, 2015](http://proquest.safaribooksonline.com/book/programming/python/9781783985128)\n", | ||||
|     "* [Pandas. Introduction to Data Structures](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro)\n", | ||||
|     "* [Introducing Pandas Objects](https://www.oreilly.com/learning/introducing-pandas-objects)\n", | ||||
|     "* [Boolean Operators in Pandas](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-operators)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Licence" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n", | ||||
|     "\n", | ||||
|     "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   "kernelspec": { | ||||
|    "display_name": "Python 3", | ||||
|    "language": "python", | ||||
|    "name": "python3" | ||||
|   }, | ||||
|   "language_info": { | ||||
|    "codemirror_mode": { | ||||
|     "name": "ipython", | ||||
|     "version": 3 | ||||
|    }, | ||||
|    "file_extension": ".py", | ||||
|    "mimetype": "text/x-python", | ||||
|    "name": "python", | ||||
|    "nbconvert_exporter": "python", | ||||
|    "pygments_lexer": "ipython3", | ||||
|    "version": "3.5.1+" | ||||
|   } | ||||
|  }, | ||||
|  "nbformat": 4, | ||||
|  "nbformat_minor": 0 | ||||
| } | ||||
							
								
								
									
										5411
									
								
								ml2/3_3_Data_Munging_with_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										5411
									
								
								ml2/3_3_Data_Munging_with_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							
							
								
								
									
										4795
									
								
								ml2/3_4_Visualisation_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										4795
									
								
								ml2/3_4_Visualisation_Pandas.ipynb
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							
							
								
								
									
										539
									
								
								ml2/3_5_Exercise_1.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										539
									
								
								ml2/3_5_Exercise_1.ipynb
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,539 @@ | ||||
| { | ||||
|  "cells": [ | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Course Notes for Learning Intelligent Systems" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Exercise - The Titanic Dataset" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "In this exercise we are going to put in practice what we have learnt in the notebooks of the session. \n", | ||||
|     "\n", | ||||
|     "Answer directly in your copy of the exercise and submit it as a moodle task." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "import pandas as pd\n", | ||||
|     "\n", | ||||
|     "import seaborn as sns\n", | ||||
|     "import matplotlib.pyplot as plt\n", | ||||
|     "import numpy as np\n", | ||||
|     "sns.set(color_codes=True)\n", | ||||
|     "\n", | ||||
|     "# if matplotlib is not set inline, you will not see plots\n", | ||||
|     "%matplotlib inline" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Reading Data" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Assign the variable *df* a Dataframe with the Titanic Dataset from the URL https://raw.githubusercontent.com/cif2cif/sitc/master/ml2/data-titanic/train.csv\"\n", | ||||
|     "\n", | ||||
|     "Print *df*." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Munging and Exploratory visualisation" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Obtain number of passengers and features of the dataset" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Obtain general statistics (count, mean, std, min, max, 25%, 50%, 75%) about the column Age" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Obtain the median of the age of the passengers" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Obtain number of missing values per feature" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "source": [ | ||||
|     "How many passsengers have survived? List them grouped by Sex and Pclass.\n", | ||||
|     "\n", | ||||
|     "Assign the result to a variable df_1 and print it" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "source": [ | ||||
|     "Visualise df_1 as an histogram." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "source": [ | ||||
|     "# Feature Engineering" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Here you can find some features that have been proposed for this dataset. Your task is to analyse them and provide some insights. \n", | ||||
|     "\n", | ||||
|     "Use pandas and visualisation to justify your conclusions" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature FamilySize " | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Regarding SbSp and Parch, we can define a new feature, 'FamilySize' that is the combination of both." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df['FamilySize'] = df['SibSp'] + df['Parch']\n", | ||||
|     "df.head()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature Alone" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "It seems many people who went alone survived. We can define a new feature 'Alone'" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df['Alone'] = (df.FamilySize == 0)\n", | ||||
|     "df.head()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature Salutation" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "If we observe well in the name variable, there is a 'title' (Mr., Miss., Mrs.). We can add a feature wit this title." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "#Taken from http://www.analyticsvidhya.com/blog/2014/09/data-munging-python-using-pandas-baby-steps-python/\n", | ||||
|     "def name_extract(word):\n", | ||||
|     "    return word.split(',')[1].split('.')[0].strip()\n", | ||||
|     "\n", | ||||
|     "df['Salutation'] = df['Name'].apply(name_extract)\n", | ||||
|     "df.head()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "We can list the different salutations." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df['Salutation'].unique()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df.groupby(['Salutation']).size()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "There only 4 main salutations, so we combine the rest of salutations in 'Others'." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "def group_salutation(old_salutation):\n", | ||||
|     "    if old_salutation == 'Mr':\n", | ||||
|     "        return('Mr')\n", | ||||
|     "    else:\n", | ||||
|     "        if old_salutation == 'Mrs':\n", | ||||
|     "            return('Mrs')\n", | ||||
|     "        else:\n", | ||||
|     "            if old_salutation == 'Master':\n", | ||||
|     "                return('Master')\n", | ||||
|     "            else: \n", | ||||
|     "                if old_salutation == 'Miss':\n", | ||||
|     "                    return('Miss')\n", | ||||
|     "                else:\n", | ||||
|     "                    return('Others')\n", | ||||
|     "df['Salutation'] = df['Salutation'].apply(group_salutation)\n", | ||||
|     "df.groupby(['Salutation']).size()" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "# Distribution\n", | ||||
|     "colors_sex = ['#ff69b4', 'b', 'r', 'y', 'm', 'c']\n", | ||||
|     "df.groupby('Salutation').size().plot(kind='bar', color=colors_sex)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df.boxplot(column='Age', by = 'Salutation', sym='k.')" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Features Children and Female" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "# Specific features for Children and Female since there are more survivors\n", | ||||
|     "df['Children']   = df['Age'].map(lambda x: 1 if x < 6.0 else 0)\n", | ||||
|     "df['Female']     = df['Gender'].map(lambda x: 1 if x == 0 else 0)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature AgeGroup" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "# Group ages to simplify machine learning algorithms.  0: 0-5, 1: 6-10, 2: 11-15, 3: 16-59 and 4: 60-80\n", | ||||
|     "df['AgeGroup'] = 0\n", | ||||
|     "df.loc[(.AgeFill<6),'AgeGroup'] = 0\n", | ||||
|     "df.loc[(df.AgeFill>=6) & (df.AgeFill < 11),'AgeGroup'] = 1\n", | ||||
|     "df.loc[(df.AgeFill>=11) & (df.AgeFill < 16),'AgeGroup'] = 2\n", | ||||
|     "df.loc[(df.AgeFill>=16) & (df.AgeFill < 60),'AgeGroup'] = 3\n", | ||||
|     "df.loc[(df.AgeFill>=60),'AgeGroup'] = 4" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature Deck\n", | ||||
|     "Only 1st class passengers have cabins, the rest are ‘Unknown’. A cabin number looks like ‘C123’. The letter refers to the deck." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "#Turning cabin number into Deck\n", | ||||
|     "cabin_list = ['A', 'B', 'C', 'D', 'E', 'F', 'T', 'G', 'Unknown']\n", | ||||
|     "df['Deck']=df['Cabin'].map(lambda x: substrings_in_string(x, cabin_list))" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature FarePerPerson" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "This feature is created from two previous features: Fare and FamilySize." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": false | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df['FarePerPerson']= df['Fare'] / (df['FamilySize'] + 1)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Feature AgeClass" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Since age and class are both numbers we can just multiply them and get a new feature.\n" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "code", | ||||
|    "execution_count": null, | ||||
|    "metadata": { | ||||
|     "collapsed": true | ||||
|    }, | ||||
|    "outputs": [], | ||||
|    "source": [ | ||||
|     "df['AgeClass']=df['Age']*df['Pclass']" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Licence" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n", | ||||
|     "\n", | ||||
|     "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   "kernelspec": { | ||||
|    "display_name": "Python 3", | ||||
|    "language": "python", | ||||
|    "name": "python3" | ||||
|   }, | ||||
|   "language_info": { | ||||
|    "codemirror_mode": { | ||||
|     "name": "ipython", | ||||
|     "version": 3 | ||||
|    }, | ||||
|    "file_extension": ".py", | ||||
|    "mimetype": "text/x-python", | ||||
|    "name": "python", | ||||
|    "nbconvert_exporter": "python", | ||||
|    "pygments_lexer": "ipython3", | ||||
|    "version": "3.5.1+" | ||||
|   } | ||||
|  }, | ||||
|  "nbformat": 4, | ||||
|  "nbformat_minor": 0 | ||||
| } | ||||
							
								
								
									
										122
									
								
								ml2/3_6_Machine_Learning.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										122
									
								
								ml2/3_6_Machine_Learning.ipynb
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,122 @@ | ||||
| { | ||||
|  "cells": [ | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Course Notes for Learning Intelligent Systems" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Machine Learning" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "In the previous session, we learnt how to apply machine learning algorithms to the Iris dataset.\n", | ||||
|     "\n", | ||||
|     "We are going now to review the full process. As probably you have notice, data preparation, cleaning and transformation takes more than 90 % of data mining effort.\n", | ||||
|     "\n", | ||||
|     "The phases are:\n", | ||||
|     "\n", | ||||
|     "* **Data ingestion**: reading the data from the data lake\n", | ||||
|     "* **Preprocessing**: \n", | ||||
|     "    * **Data cleaning (munging)**:  fill missing values, smooth noisy data (binning methods), identify or remove outlier, and resolve inconsistencies \n", | ||||
|     "    * **Data integration**: Integrate multiple datasets\n", | ||||
|     "    * **Data transformation**: normalization (rescale numeric values between 0 and 1), standardisation (rescale values to have mean of 0 and std of 1), transformation for smoothing a variable (e.g. square toot, ...), aggregation of data from several datasets\n", | ||||
|     "    * **Data reduction**: dimensionality reduction, clustering and sampling. \n", | ||||
|     "    * **Data discretization**: for numerical values and algorithms that do not accept continuous variables\n", | ||||
|     "    * **Feature engineering**: selection of most relevant features, creation of new features and delete non relevant features\n", | ||||
|     "    * Apply  Sampling for dividing the dataset into training and test datasets.\n", | ||||
|     "* **Machine learning**: apply machine learning algorithms and obtain an estimator, tuning its parameters.\n", | ||||
|     "* **Evaluation** of the model\n", | ||||
|     "* **Prediction**: use the model for new data." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "\n", | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Licence" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "* [Python Machine Learning](http://proquest.safaribooksonline.com/book/programming/python/9781783555130), Sebastian Raschka, Packt Publishing, 2015." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Licence" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n", | ||||
|     "\n", | ||||
|     "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   "kernelspec": { | ||||
|    "display_name": "Python 3", | ||||
|    "language": "python", | ||||
|    "name": "python3" | ||||
|   }, | ||||
|   "language_info": { | ||||
|    "codemirror_mode": { | ||||
|     "name": "ipython", | ||||
|     "version": 3 | ||||
|    }, | ||||
|    "file_extension": ".py", | ||||
|    "mimetype": "text/x-python", | ||||
|    "name": "python", | ||||
|    "nbconvert_exporter": "python", | ||||
|    "pygments_lexer": "ipython3", | ||||
|    "version": "3.5.1+" | ||||
|   } | ||||
|  }, | ||||
|  "nbformat": 4, | ||||
|  "nbformat_minor": 0 | ||||
| } | ||||
							
								
								
									
										1178
									
								
								ml2/3_7_SVM.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										1178
									
								
								ml2/3_7_SVM.ipynb
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							
							
								
								
									
										89
									
								
								ml2/3_8_Exercise_2.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										89
									
								
								ml2/3_8_Exercise_2.ipynb
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,89 @@ | ||||
| { | ||||
|  "cells": [ | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Course Notes for Learning Intelligent Systems" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "# Exercise 2 - The Titanic Dataset" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "In this exercise we are going to put in practice what we have learnt in the notebooks of the session. \n", | ||||
|     "\n", | ||||
|     "In the previous notebook we have been applying the SVM machine learning algorithm.\n", | ||||
|     "\n", | ||||
|     "Your task is to apply other machine learning algorithms (at least 2) that you have seen in theory or others you are interested in.\n", | ||||
|     "\n", | ||||
|     "You should compare the algorithms and describe your experiments." | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Licence" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n", | ||||
|     "\n", | ||||
|     "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   "kernelspec": { | ||||
|    "display_name": "Python 3", | ||||
|    "language": "python", | ||||
|    "name": "python3" | ||||
|   }, | ||||
|   "language_info": { | ||||
|    "codemirror_mode": { | ||||
|     "name": "ipython", | ||||
|     "version": 3 | ||||
|    }, | ||||
|    "file_extension": ".py", | ||||
|    "mimetype": "text/x-python", | ||||
|    "name": "python", | ||||
|    "nbconvert_exporter": "python", | ||||
|    "pygments_lexer": "ipython3", | ||||
|    "version": "3.5.1+" | ||||
|   } | ||||
|  }, | ||||
|  "nbformat": 4, | ||||
|  "nbformat_minor": 0 | ||||
| } | ||||
| @@ -1,419 +0,0 @@ | ||||
| PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked | ||||
| 892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q | ||||
| 893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S | ||||
| 894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q | ||||
| 895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S | ||||
| 896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S | ||||
| 897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S | ||||
| 898,3,"Connolly, Miss. Kate",female,30,0,0,330972,7.6292,,Q | ||||
| 899,2,"Caldwell, Mr. Albert Francis",male,26,1,1,248738,29,,S | ||||
| 900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18,0,0,2657,7.2292,,C | ||||
| 901,3,"Davies, Mr. John Samuel",male,21,2,0,A/4 48871,24.15,,S | ||||
| 902,3,"Ilieff, Mr. Ylio",male,,0,0,349220,7.8958,,S | ||||
| 903,1,"Jones, Mr. Charles Cresson",male,46,0,0,694,26,,S | ||||
| 904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23,1,0,21228,82.2667,B45,S | ||||
| 905,2,"Howard, Mr. Benjamin",male,63,1,0,24065,26,,S | ||||
| 906,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance Toogood)",female,47,1,0,W.E.P. 5734,61.175,E31,S | ||||
| 907,2,"del Carlo, Mrs. Sebastiano (Argenia Genovesi)",female,24,1,0,SC/PARIS 2167,27.7208,,C | ||||
| 908,2,"Keane, Mr. Daniel",male,35,0,0,233734,12.35,,Q | ||||
| 909,3,"Assaf, Mr. Gerios",male,21,0,0,2692,7.225,,C | ||||
| 910,3,"Ilmakangas, Miss. Ida Livija",female,27,1,0,STON/O2. 3101270,7.925,,S | ||||
| 911,3,"Assaf Khalil, Mrs. Mariana (Miriam"")""",female,45,0,0,2696,7.225,,C | ||||
| 912,1,"Rothschild, Mr. Martin",male,55,1,0,PC 17603,59.4,,C | ||||
| 913,3,"Olsen, Master. Artur Karl",male,9,0,1,C 17368,3.1708,,S | ||||
| 914,1,"Flegenheim, Mrs. Alfred (Antoinette)",female,,0,0,PC 17598,31.6833,,S | ||||
| 915,1,"Williams, Mr. Richard Norris II",male,21,0,1,PC 17597,61.3792,,C | ||||
| 916,1,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",female,48,1,3,PC 17608,262.375,B57 B59 B63 B66,C | ||||
| 917,3,"Robins, Mr. Alexander A",male,50,1,0,A/5. 3337,14.5,,S | ||||
| 918,1,"Ostby, Miss. Helene Ragnhild",female,22,0,1,113509,61.9792,B36,C | ||||
| 919,3,"Daher, Mr. Shedid",male,22.5,0,0,2698,7.225,,C | ||||
| 920,1,"Brady, Mr. John Bertram",male,41,0,0,113054,30.5,A21,S | ||||
| 921,3,"Samaan, Mr. Elias",male,,2,0,2662,21.6792,,C | ||||
| 922,2,"Louch, Mr. Charles Alexander",male,50,1,0,SC/AH 3085,26,,S | ||||
| 923,2,"Jefferys, Mr. Clifford Thomas",male,24,2,0,C.A. 31029,31.5,,S | ||||
| 924,3,"Dean, Mrs. Bertram (Eva Georgetta Light)",female,33,1,2,C.A. 2315,20.575,,S | ||||
| 925,3,"Johnston, Mrs. Andrew G (Elizabeth Lily"" Watson)""",female,,1,2,W./C. 6607,23.45,,S | ||||
| 926,1,"Mock, Mr. Philipp Edmund",male,30,1,0,13236,57.75,C78,C | ||||
| 927,3,"Katavelas, Mr. Vassilios (Catavelas Vassilios"")""",male,18.5,0,0,2682,7.2292,,C | ||||
| 928,3,"Roth, Miss. Sarah A",female,,0,0,342712,8.05,,S | ||||
| 929,3,"Cacic, Miss. Manda",female,21,0,0,315087,8.6625,,S | ||||
| 930,3,"Sap, Mr. Julius",male,25,0,0,345768,9.5,,S | ||||
| 931,3,"Hee, Mr. Ling",male,,0,0,1601,56.4958,,S | ||||
| 932,3,"Karun, Mr. Franz",male,39,0,1,349256,13.4167,,C | ||||
| 933,1,"Franklin, Mr. Thomas Parham",male,,0,0,113778,26.55,D34,S | ||||
| 934,3,"Goldsmith, Mr. Nathan",male,41,0,0,SOTON/O.Q. 3101263,7.85,,S | ||||
| 935,2,"Corbett, Mrs. Walter H (Irene Colvin)",female,30,0,0,237249,13,,S | ||||
| 936,1,"Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)",female,45,1,0,11753,52.5542,D19,S | ||||
| 937,3,"Peltomaki, Mr. Nikolai Johannes",male,25,0,0,STON/O 2. 3101291,7.925,,S | ||||
| 938,1,"Chevre, Mr. Paul Romaine",male,45,0,0,PC 17594,29.7,A9,C | ||||
| 939,3,"Shaughnessy, Mr. Patrick",male,,0,0,370374,7.75,,Q | ||||
| 940,1,"Bucknell, Mrs. William Robert (Emma Eliza Ward)",female,60,0,0,11813,76.2917,D15,C | ||||
| 941,3,"Coutts, Mrs. William (Winnie Minnie"" Treanor)""",female,36,0,2,C.A. 37671,15.9,,S | ||||
| 942,1,"Smith, Mr. Lucien Philip",male,24,1,0,13695,60,C31,S | ||||
| 943,2,"Pulbaum, Mr. Franz",male,27,0,0,SC/PARIS 2168,15.0333,,C | ||||
| 944,2,"Hocking, Miss. Ellen Nellie""""",female,20,2,1,29105,23,,S | ||||
| 945,1,"Fortune, Miss. Ethel Flora",female,28,3,2,19950,263,C23 C25 C27,S | ||||
| 946,2,"Mangiavacchi, Mr. Serafino Emilio",male,,0,0,SC/A.3 2861,15.5792,,C | ||||
| 947,3,"Rice, Master. Albert",male,10,4,1,382652,29.125,,Q | ||||
| 948,3,"Cor, Mr. Bartol",male,35,0,0,349230,7.8958,,S | ||||
| 949,3,"Abelseth, Mr. Olaus Jorgensen",male,25,0,0,348122,7.65,F G63,S | ||||
| 950,3,"Davison, Mr. Thomas Henry",male,,1,0,386525,16.1,,S | ||||
| 951,1,"Chaudanson, Miss. Victorine",female,36,0,0,PC 17608,262.375,B61,C | ||||
| 952,3,"Dika, Mr. Mirko",male,17,0,0,349232,7.8958,,S | ||||
| 953,2,"McCrae, Mr. Arthur Gordon",male,32,0,0,237216,13.5,,S | ||||
| 954,3,"Bjorklund, Mr. Ernst Herbert",male,18,0,0,347090,7.75,,S | ||||
| 955,3,"Bradley, Miss. Bridget Delia",female,22,0,0,334914,7.725,,Q | ||||
| 956,1,"Ryerson, Master. John Borie",male,13,2,2,PC 17608,262.375,B57 B59 B63 B66,C | ||||
| 957,2,"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)",female,,0,0,F.C.C. 13534,21,,S | ||||
| 958,3,"Burns, Miss. Mary Delia",female,18,0,0,330963,7.8792,,Q | ||||
| 959,1,"Moore, Mr. Clarence Bloomfield",male,47,0,0,113796,42.4,,S | ||||
| 960,1,"Tucker, Mr. Gilbert Milligan Jr",male,31,0,0,2543,28.5375,C53,C | ||||
| 961,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60,1,4,19950,263,C23 C25 C27,S | ||||
| 962,3,"Mulvihill, Miss. Bertha E",female,24,0,0,382653,7.75,,Q | ||||
| 963,3,"Minkoff, Mr. Lazar",male,21,0,0,349211,7.8958,,S | ||||
| 964,3,"Nieminen, Miss. Manta Josefina",female,29,0,0,3101297,7.925,,S | ||||
| 965,1,"Ovies y Rodriguez, Mr. Servando",male,28.5,0,0,PC 17562,27.7208,D43,C | ||||
| 966,1,"Geiger, Miss. Amalie",female,35,0,0,113503,211.5,C130,C | ||||
| 967,1,"Keeping, Mr. Edwin",male,32.5,0,0,113503,211.5,C132,C | ||||
| 968,3,"Miles, Mr. Frank",male,,0,0,359306,8.05,,S | ||||
| 969,1,"Cornell, Mrs. Robert Clifford (Malvina Helen Lamson)",female,55,2,0,11770,25.7,C101,S | ||||
| 970,2,"Aldworth, Mr. Charles Augustus",male,30,0,0,248744,13,,S | ||||
| 971,3,"Doyle, Miss. Elizabeth",female,24,0,0,368702,7.75,,Q | ||||
| 972,3,"Boulos, Master. Akar",male,6,1,1,2678,15.2458,,C | ||||
| 973,1,"Straus, Mr. Isidor",male,67,1,0,PC 17483,221.7792,C55 C57,S | ||||
| 974,1,"Case, Mr. Howard Brown",male,49,0,0,19924,26,,S | ||||
| 975,3,"Demetri, Mr. Marinko",male,,0,0,349238,7.8958,,S | ||||
| 976,2,"Lamb, Mr. John Joseph",male,,0,0,240261,10.7083,,Q | ||||
| 977,3,"Khalil, Mr. Betros",male,,1,0,2660,14.4542,,C | ||||
| 978,3,"Barry, Miss. Julia",female,27,0,0,330844,7.8792,,Q | ||||
| 979,3,"Badman, Miss. Emily Louisa",female,18,0,0,A/4 31416,8.05,,S | ||||
| 980,3,"O'Donoghue, Ms. Bridget",female,,0,0,364856,7.75,,Q | ||||
| 981,2,"Wells, Master. Ralph Lester",male,2,1,1,29103,23,,S | ||||
| 982,3,"Dyker, Mrs. Adolf Fredrik (Anna Elisabeth Judith Andersson)",female,22,1,0,347072,13.9,,S | ||||
| 983,3,"Pedersen, Mr. Olaf",male,,0,0,345498,7.775,,S | ||||
| 984,1,"Davidson, Mrs. Thornton (Orian Hays)",female,27,1,2,F.C. 12750,52,B71,S | ||||
| 985,3,"Guest, Mr. Robert",male,,0,0,376563,8.05,,S | ||||
| 986,1,"Birnbaum, Mr. Jakob",male,25,0,0,13905,26,,C | ||||
| 987,3,"Tenglin, Mr. Gunnar Isidor",male,25,0,0,350033,7.7958,,S | ||||
| 988,1,"Cavendish, Mrs. Tyrell William (Julia Florence Siegel)",female,76,1,0,19877,78.85,C46,S | ||||
| 989,3,"Makinen, Mr. Kalle Edvard",male,29,0,0,STON/O 2. 3101268,7.925,,S | ||||
| 990,3,"Braf, Miss. Elin Ester Maria",female,20,0,0,347471,7.8542,,S | ||||
| 991,3,"Nancarrow, Mr. William Henry",male,33,0,0,A./5. 3338,8.05,,S | ||||
| 992,1,"Stengel, Mrs. Charles Emil Henry (Annie May Morris)",female,43,1,0,11778,55.4417,C116,C | ||||
| 993,2,"Weisz, Mr. Leopold",male,27,1,0,228414,26,,S | ||||
| 994,3,"Foley, Mr. William",male,,0,0,365235,7.75,,Q | ||||
| 995,3,"Johansson Palmquist, Mr. Oskar Leander",male,26,0,0,347070,7.775,,S | ||||
| 996,3,"Thomas, Mrs. Alexander (Thamine Thelma"")""",female,16,1,1,2625,8.5167,,C | ||||
| 997,3,"Holthen, Mr. Johan Martin",male,28,0,0,C 4001,22.525,,S | ||||
| 998,3,"Buckley, Mr. Daniel",male,21,0,0,330920,7.8208,,Q | ||||
| 999,3,"Ryan, Mr. Edward",male,,0,0,383162,7.75,,Q | ||||
| 1000,3,"Willer, Mr. Aaron (Abi Weller"")""",male,,0,0,3410,8.7125,,S | ||||
| 1001,2,"Swane, Mr. George",male,18.5,0,0,248734,13,F,S | ||||
| 1002,2,"Stanton, Mr. Samuel Ward",male,41,0,0,237734,15.0458,,C | ||||
| 1003,3,"Shine, Miss. Ellen Natalia",female,,0,0,330968,7.7792,,Q | ||||
| 1004,1,"Evans, Miss. Edith Corse",female,36,0,0,PC 17531,31.6792,A29,C | ||||
| 1005,3,"Buckley, Miss. Katherine",female,18.5,0,0,329944,7.2833,,Q | ||||
| 1006,1,"Straus, Mrs. Isidor (Rosalie Ida Blun)",female,63,1,0,PC 17483,221.7792,C55 C57,S | ||||
| 1007,3,"Chronopoulos, Mr. Demetrios",male,18,1,0,2680,14.4542,,C | ||||
| 1008,3,"Thomas, Mr. John",male,,0,0,2681,6.4375,,C | ||||
| 1009,3,"Sandstrom, Miss. Beatrice Irene",female,1,1,1,PP 9549,16.7,G6,S | ||||
| 1010,1,"Beattie, Mr. Thomson",male,36,0,0,13050,75.2417,C6,C | ||||
| 1011,2,"Chapman, Mrs. John Henry (Sara Elizabeth Lawry)",female,29,1,0,SC/AH 29037,26,,S | ||||
| 1012,2,"Watt, Miss. Bertha J",female,12,0,0,C.A. 33595,15.75,,S | ||||
| 1013,3,"Kiernan, Mr. John",male,,1,0,367227,7.75,,Q | ||||
| 1014,1,"Schabert, Mrs. Paul (Emma Mock)",female,35,1,0,13236,57.75,C28,C | ||||
| 1015,3,"Carver, Mr. Alfred John",male,28,0,0,392095,7.25,,S | ||||
| 1016,3,"Kennedy, Mr. John",male,,0,0,368783,7.75,,Q | ||||
| 1017,3,"Cribb, Miss. Laura Alice",female,17,0,1,371362,16.1,,S | ||||
| 1018,3,"Brobeck, Mr. Karl Rudolf",male,22,0,0,350045,7.7958,,S | ||||
| 1019,3,"McCoy, Miss. Alicia",female,,2,0,367226,23.25,,Q | ||||
| 1020,2,"Bowenur, Mr. Solomon",male,42,0,0,211535,13,,S | ||||
| 1021,3,"Petersen, Mr. Marius",male,24,0,0,342441,8.05,,S | ||||
| 1022,3,"Spinner, Mr. Henry John",male,32,0,0,STON/OQ. 369943,8.05,,S | ||||
| 1023,1,"Gracie, Col. Archibald IV",male,53,0,0,113780,28.5,C51,C | ||||
| 1024,3,"Lefebre, Mrs. Frank (Frances)",female,,0,4,4133,25.4667,,S | ||||
| 1025,3,"Thomas, Mr. Charles P",male,,1,0,2621,6.4375,,C | ||||
| 1026,3,"Dintcheff, Mr. Valtcho",male,43,0,0,349226,7.8958,,S | ||||
| 1027,3,"Carlsson, Mr. Carl Robert",male,24,0,0,350409,7.8542,,S | ||||
| 1028,3,"Zakarian, Mr. Mapriededer",male,26.5,0,0,2656,7.225,,C | ||||
| 1029,2,"Schmidt, Mr. August",male,26,0,0,248659,13,,S | ||||
| 1030,3,"Drapkin, Miss. Jennie",female,23,0,0,SOTON/OQ 392083,8.05,,S | ||||
| 1031,3,"Goodwin, Mr. Charles Frederick",male,40,1,6,CA 2144,46.9,,S | ||||
| 1032,3,"Goodwin, Miss. Jessie Allis",female,10,5,2,CA 2144,46.9,,S | ||||
| 1033,1,"Daniels, Miss. Sarah",female,33,0,0,113781,151.55,,S | ||||
| 1034,1,"Ryerson, Mr. Arthur Larned",male,61,1,3,PC 17608,262.375,B57 B59 B63 B66,C | ||||
| 1035,2,"Beauchamp, Mr. Henry James",male,28,0,0,244358,26,,S | ||||
| 1036,1,"Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey"")""",male,42,0,0,17475,26.55,,S | ||||
| 1037,3,"Vander Planke, Mr. Julius",male,31,3,0,345763,18,,S | ||||
| 1038,1,"Hilliard, Mr. Herbert Henry",male,,0,0,17463,51.8625,E46,S | ||||
| 1039,3,"Davies, Mr. Evan",male,22,0,0,SC/A4 23568,8.05,,S | ||||
| 1040,1,"Crafton, Mr. John Bertram",male,,0,0,113791,26.55,,S | ||||
| 1041,2,"Lahtinen, Rev. William",male,30,1,1,250651,26,,S | ||||
| 1042,1,"Earnshaw, Mrs. Boulton (Olive Potter)",female,23,0,1,11767,83.1583,C54,C | ||||
| 1043,3,"Matinoff, Mr. Nicola",male,,0,0,349255,7.8958,,C | ||||
| 1044,3,"Storey, Mr. Thomas",male,60.5,0,0,3701,,,S | ||||
| 1045,3,"Klasen, Mrs. (Hulda Kristina Eugenia Lofqvist)",female,36,0,2,350405,12.1833,,S | ||||
| 1046,3,"Asplund, Master. Filip Oscar",male,13,4,2,347077,31.3875,,S | ||||
| 1047,3,"Duquemin, Mr. Joseph",male,24,0,0,S.O./P.P. 752,7.55,,S | ||||
| 1048,1,"Bird, Miss. Ellen",female,29,0,0,PC 17483,221.7792,C97,S | ||||
| 1049,3,"Lundin, Miss. Olga Elida",female,23,0,0,347469,7.8542,,S | ||||
| 1050,1,"Borebank, Mr. John James",male,42,0,0,110489,26.55,D22,S | ||||
| 1051,3,"Peacock, Mrs. Benjamin (Edith Nile)",female,26,0,2,SOTON/O.Q. 3101315,13.775,,S | ||||
| 1052,3,"Smyth, Miss. Julia",female,,0,0,335432,7.7333,,Q | ||||
| 1053,3,"Touma, Master. Georges Youssef",male,7,1,1,2650,15.2458,,C | ||||
| 1054,2,"Wright, Miss. Marion",female,26,0,0,220844,13.5,,S | ||||
| 1055,3,"Pearce, Mr. Ernest",male,,0,0,343271,7,,S | ||||
| 1056,2,"Peruschitz, Rev. Joseph Maria",male,41,0,0,237393,13,,S | ||||
| 1057,3,"Kink-Heilmann, Mrs. Anton (Luise Heilmann)",female,26,1,1,315153,22.025,,S | ||||
| 1058,1,"Brandeis, Mr. Emil",male,48,0,0,PC 17591,50.4958,B10,C | ||||
| 1059,3,"Ford, Mr. Edward Watson",male,18,2,2,W./C. 6608,34.375,,S | ||||
| 1060,1,"Cassebeer, Mrs. Henry Arthur Jr (Eleanor Genevieve Fosdick)",female,,0,0,17770,27.7208,,C | ||||
| 1061,3,"Hellstrom, Miss. Hilda Maria",female,22,0,0,7548,8.9625,,S | ||||
| 1062,3,"Lithman, Mr. Simon",male,,0,0,S.O./P.P. 251,7.55,,S | ||||
| 1063,3,"Zakarian, Mr. Ortin",male,27,0,0,2670,7.225,,C | ||||
| 1064,3,"Dyker, Mr. Adolf Fredrik",male,23,1,0,347072,13.9,,S | ||||
| 1065,3,"Torfa, Mr. Assad",male,,0,0,2673,7.2292,,C | ||||
| 1066,3,"Asplund, Mr. Carl Oscar Vilhelm Gustafsson",male,40,1,5,347077,31.3875,,S | ||||
| 1067,2,"Brown, Miss. Edith Eileen",female,15,0,2,29750,39,,S | ||||
| 1068,2,"Sincock, Miss. Maude",female,20,0,0,C.A. 33112,36.75,,S | ||||
| 1069,1,"Stengel, Mr. Charles Emil Henry",male,54,1,0,11778,55.4417,C116,C | ||||
| 1070,2,"Becker, Mrs. Allen Oliver (Nellie E Baumgardner)",female,36,0,3,230136,39,F4,S | ||||
| 1071,1,"Compton, Mrs. Alexander Taylor (Mary Eliza Ingersoll)",female,64,0,2,PC 17756,83.1583,E45,C | ||||
| 1072,2,"McCrie, Mr. James Matthew",male,30,0,0,233478,13,,S | ||||
| 1073,1,"Compton, Mr. Alexander Taylor Jr",male,37,1,1,PC 17756,83.1583,E52,C | ||||
| 1074,1,"Marvin, Mrs. Daniel Warner (Mary Graham Carmichael Farquarson)",female,18,1,0,113773,53.1,D30,S | ||||
| 1075,3,"Lane, Mr. Patrick",male,,0,0,7935,7.75,,Q | ||||
| 1076,1,"Douglas, Mrs. Frederick Charles (Mary Helene Baxter)",female,27,1,1,PC 17558,247.5208,B58 B60,C | ||||
| 1077,2,"Maybery, Mr. Frank Hubert",male,40,0,0,239059,16,,S | ||||
| 1078,2,"Phillips, Miss. Alice Frances Louisa",female,21,0,1,S.O./P.P. 2,21,,S | ||||
| 1079,3,"Davies, Mr. Joseph",male,17,2,0,A/4 48873,8.05,,S | ||||
| 1080,3,"Sage, Miss. Ada",female,,8,2,CA. 2343,69.55,,S | ||||
| 1081,2,"Veal, Mr. James",male,40,0,0,28221,13,,S | ||||
| 1082,2,"Angle, Mr. William A",male,34,1,0,226875,26,,S | ||||
| 1083,1,"Salomon, Mr. Abraham L",male,,0,0,111163,26,,S | ||||
| 1084,3,"van Billiard, Master. Walter John",male,11.5,1,1,A/5. 851,14.5,,S | ||||
| 1085,2,"Lingane, Mr. John",male,61,0,0,235509,12.35,,Q | ||||
| 1086,2,"Drew, Master. Marshall Brines",male,8,0,2,28220,32.5,,S | ||||
| 1087,3,"Karlsson, Mr. Julius Konrad Eugen",male,33,0,0,347465,7.8542,,S | ||||
| 1088,1,"Spedden, Master. Robert Douglas",male,6,0,2,16966,134.5,E34,C | ||||
| 1089,3,"Nilsson, Miss. Berta Olivia",female,18,0,0,347066,7.775,,S | ||||
| 1090,2,"Baimbrigge, Mr. Charles Robert",male,23,0,0,C.A. 31030,10.5,,S | ||||
| 1091,3,"Rasmussen, Mrs. (Lena Jacobsen Solvang)",female,,0,0,65305,8.1125,,S | ||||
| 1092,3,"Murphy, Miss. Nora",female,,0,0,36568,15.5,,Q | ||||
| 1093,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S | ||||
| 1094,1,"Astor, Col. John Jacob",male,47,1,0,PC 17757,227.525,C62 C64,C | ||||
| 1095,2,"Quick, Miss. Winifred Vera",female,8,1,1,26360,26,,S | ||||
| 1096,2,"Andrew, Mr. Frank Thomas",male,25,0,0,C.A. 34050,10.5,,S | ||||
| 1097,1,"Omont, Mr. Alfred Fernand",male,,0,0,F.C. 12998,25.7417,,C | ||||
| 1098,3,"McGowan, Miss. Katherine",female,35,0,0,9232,7.75,,Q | ||||
| 1099,2,"Collett, Mr. Sidney C Stuart",male,24,0,0,28034,10.5,,S | ||||
| 1100,1,"Rosenbaum, Miss. Edith Louise",female,33,0,0,PC 17613,27.7208,A11,C | ||||
| 1101,3,"Delalic, Mr. Redjo",male,25,0,0,349250,7.8958,,S | ||||
| 1102,3,"Andersen, Mr. Albert Karvin",male,32,0,0,C 4001,22.525,,S | ||||
| 1103,3,"Finoli, Mr. Luigi",male,,0,0,SOTON/O.Q. 3101308,7.05,,S | ||||
| 1104,2,"Deacon, Mr. Percy William",male,17,0,0,S.O.C. 14879,73.5,,S | ||||
| 1105,2,"Howard, Mrs. Benjamin (Ellen Truelove Arman)",female,60,1,0,24065,26,,S | ||||
| 1106,3,"Andersson, Miss. Ida Augusta Margareta",female,38,4,2,347091,7.775,,S | ||||
| 1107,1,"Head, Mr. Christopher",male,42,0,0,113038,42.5,B11,S | ||||
| 1108,3,"Mahon, Miss. Bridget Delia",female,,0,0,330924,7.8792,,Q | ||||
| 1109,1,"Wick, Mr. George Dennick",male,57,1,1,36928,164.8667,,S | ||||
| 1110,1,"Widener, Mrs. George Dunton (Eleanor Elkins)",female,50,1,1,113503,211.5,C80,C | ||||
| 1111,3,"Thomson, Mr. Alexander Morrison",male,,0,0,32302,8.05,,S | ||||
| 1112,2,"Duran y More, Miss. Florentina",female,30,1,0,SC/PARIS 2148,13.8583,,C | ||||
| 1113,3,"Reynolds, Mr. Harold J",male,21,0,0,342684,8.05,,S | ||||
| 1114,2,"Cook, Mrs. (Selena Rogers)",female,22,0,0,W./C. 14266,10.5,F33,S | ||||
| 1115,3,"Karlsson, Mr. Einar Gervasius",male,21,0,0,350053,7.7958,,S | ||||
| 1116,1,"Candee, Mrs. Edward (Helen Churchill Hungerford)",female,53,0,0,PC 17606,27.4458,,C | ||||
| 1117,3,"Moubarek, Mrs. George (Omine Amenia"" Alexander)""",female,,0,2,2661,15.2458,,C | ||||
| 1118,3,"Asplund, Mr. Johan Charles",male,23,0,0,350054,7.7958,,S | ||||
| 1119,3,"McNeill, Miss. Bridget",female,,0,0,370368,7.75,,Q | ||||
| 1120,3,"Everett, Mr. Thomas James",male,40.5,0,0,C.A. 6212,15.1,,S | ||||
| 1121,2,"Hocking, Mr. Samuel James Metcalfe",male,36,0,0,242963,13,,S | ||||
| 1122,2,"Sweet, Mr. George Frederick",male,14,0,0,220845,65,,S | ||||
| 1123,1,"Willard, Miss. Constance",female,21,0,0,113795,26.55,,S | ||||
| 1124,3,"Wiklund, Mr. Karl Johan",male,21,1,0,3101266,6.4958,,S | ||||
| 1125,3,"Linehan, Mr. Michael",male,,0,0,330971,7.8792,,Q | ||||
| 1126,1,"Cumings, Mr. John Bradley",male,39,1,0,PC 17599,71.2833,C85,C | ||||
| 1127,3,"Vendel, Mr. Olof Edvin",male,20,0,0,350416,7.8542,,S | ||||
| 1128,1,"Warren, Mr. Frank Manley",male,64,1,0,110813,75.25,D37,C | ||||
| 1129,3,"Baccos, Mr. Raffull",male,20,0,0,2679,7.225,,C | ||||
| 1130,2,"Hiltunen, Miss. Marta",female,18,1,1,250650,13,,S | ||||
| 1131,1,"Douglas, Mrs. Walter Donald (Mahala Dutton)",female,48,1,0,PC 17761,106.425,C86,C | ||||
| 1132,1,"Lindstrom, Mrs. Carl Johan (Sigrid Posse)",female,55,0,0,112377,27.7208,,C | ||||
| 1133,2,"Christy, Mrs. (Alice Frances)",female,45,0,2,237789,30,,S | ||||
| 1134,1,"Spedden, Mr. Frederic Oakley",male,45,1,1,16966,134.5,E34,C | ||||
| 1135,3,"Hyman, Mr. Abraham",male,,0,0,3470,7.8875,,S | ||||
| 1136,3,"Johnston, Master. William Arthur Willie""""",male,,1,2,W./C. 6607,23.45,,S | ||||
| 1137,1,"Kenyon, Mr. Frederick R",male,41,1,0,17464,51.8625,D21,S | ||||
| 1138,2,"Karnes, Mrs. J Frank (Claire Bennett)",female,22,0,0,F.C.C. 13534,21,,S | ||||
| 1139,2,"Drew, Mr. James Vivian",male,42,1,1,28220,32.5,,S | ||||
| 1140,2,"Hold, Mrs. Stephen (Annie Margaret Hill)",female,29,1,0,26707,26,,S | ||||
| 1141,3,"Khalil, Mrs. Betros (Zahie Maria"" Elias)""",female,,1,0,2660,14.4542,,C | ||||
| 1142,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S | ||||
| 1143,3,"Abrahamsson, Mr. Abraham August Johannes",male,20,0,0,SOTON/O2 3101284,7.925,,S | ||||
| 1144,1,"Clark, Mr. Walter Miller",male,27,1,0,13508,136.7792,C89,C | ||||
| 1145,3,"Salander, Mr. Karl Johan",male,24,0,0,7266,9.325,,S | ||||
| 1146,3,"Wenzel, Mr. Linhart",male,32.5,0,0,345775,9.5,,S | ||||
| 1147,3,"MacKay, Mr. George William",male,,0,0,C.A. 42795,7.55,,S | ||||
| 1148,3,"Mahon, Mr. John",male,,0,0,AQ/4 3130,7.75,,Q | ||||
| 1149,3,"Niklasson, Mr. Samuel",male,28,0,0,363611,8.05,,S | ||||
| 1150,2,"Bentham, Miss. Lilian W",female,19,0,0,28404,13,,S | ||||
| 1151,3,"Midtsjo, Mr. Karl Albert",male,21,0,0,345501,7.775,,S | ||||
| 1152,3,"de Messemaeker, Mr. Guillaume Joseph",male,36.5,1,0,345572,17.4,,S | ||||
| 1153,3,"Nilsson, Mr. August Ferdinand",male,21,0,0,350410,7.8542,,S | ||||
| 1154,2,"Wells, Mrs. Arthur Henry (Addie"" Dart Trevaskis)""",female,29,0,2,29103,23,,S | ||||
| 1155,3,"Klasen, Miss. Gertrud Emilia",female,1,1,1,350405,12.1833,,S | ||||
| 1156,2,"Portaluppi, Mr. Emilio Ilario Giuseppe",male,30,0,0,C.A. 34644,12.7375,,C | ||||
| 1157,3,"Lyntakoff, Mr. Stanko",male,,0,0,349235,7.8958,,S | ||||
| 1158,1,"Chisholm, Mr. Roderick Robert Crispin",male,,0,0,112051,0,,S | ||||
| 1159,3,"Warren, Mr. Charles William",male,,0,0,C.A. 49867,7.55,,S | ||||
| 1160,3,"Howard, Miss. May Elizabeth",female,,0,0,A. 2. 39186,8.05,,S | ||||
| 1161,3,"Pokrnic, Mr. Mate",male,17,0,0,315095,8.6625,,S | ||||
| 1162,1,"McCaffry, Mr. Thomas Francis",male,46,0,0,13050,75.2417,C6,C | ||||
| 1163,3,"Fox, Mr. Patrick",male,,0,0,368573,7.75,,Q | ||||
| 1164,1,"Clark, Mrs. Walter Miller (Virginia McDowell)",female,26,1,0,13508,136.7792,C89,C | ||||
| 1165,3,"Lennon, Miss. Mary",female,,1,0,370371,15.5,,Q | ||||
| 1166,3,"Saade, Mr. Jean Nassr",male,,0,0,2676,7.225,,C | ||||
| 1167,2,"Bryhl, Miss. Dagmar Jenny Ingeborg ",female,20,1,0,236853,26,,S | ||||
| 1168,2,"Parker, Mr. Clifford Richard",male,28,0,0,SC 14888,10.5,,S | ||||
| 1169,2,"Faunthorpe, Mr. Harry",male,40,1,0,2926,26,,S | ||||
| 1170,2,"Ware, Mr. John James",male,30,1,0,CA 31352,21,,S | ||||
| 1171,2,"Oxenham, Mr. Percy Thomas",male,22,0,0,W./C. 14260,10.5,,S | ||||
| 1172,3,"Oreskovic, Miss. Jelka",female,23,0,0,315085,8.6625,,S | ||||
| 1173,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S | ||||
| 1174,3,"Fleming, Miss. Honora",female,,0,0,364859,7.75,,Q | ||||
| 1175,3,"Touma, Miss. Maria Youssef",female,9,1,1,2650,15.2458,,C | ||||
| 1176,3,"Rosblom, Miss. Salli Helena",female,2,1,1,370129,20.2125,,S | ||||
| 1177,3,"Dennis, Mr. William",male,36,0,0,A/5 21175,7.25,,S | ||||
| 1178,3,"Franklin, Mr. Charles (Charles Fardon)",male,,0,0,SOTON/O.Q. 3101314,7.25,,S | ||||
| 1179,1,"Snyder, Mr. John Pillsbury",male,24,1,0,21228,82.2667,B45,S | ||||
| 1180,3,"Mardirosian, Mr. Sarkis",male,,0,0,2655,7.2292,F E46,C | ||||
| 1181,3,"Ford, Mr. Arthur",male,,0,0,A/5 1478,8.05,,S | ||||
| 1182,1,"Rheims, Mr. George Alexander Lucien",male,,0,0,PC 17607,39.6,,S | ||||
| 1183,3,"Daly, Miss. Margaret Marcella Maggie""""",female,30,0,0,382650,6.95,,Q | ||||
| 1184,3,"Nasr, Mr. Mustafa",male,,0,0,2652,7.2292,,C | ||||
| 1185,1,"Dodge, Dr. Washington",male,53,1,1,33638,81.8583,A34,S | ||||
| 1186,3,"Wittevrongel, Mr. Camille",male,36,0,0,345771,9.5,,S | ||||
| 1187,3,"Angheloff, Mr. Minko",male,26,0,0,349202,7.8958,,S | ||||
| 1188,2,"Laroche, Miss. Louise",female,1,1,2,SC/Paris 2123,41.5792,,C | ||||
| 1189,3,"Samaan, Mr. Hanna",male,,2,0,2662,21.6792,,C | ||||
| 1190,1,"Loring, Mr. Joseph Holland",male,30,0,0,113801,45.5,,S | ||||
| 1191,3,"Johansson, Mr. Nils",male,29,0,0,347467,7.8542,,S | ||||
| 1192,3,"Olsson, Mr. Oscar Wilhelm",male,32,0,0,347079,7.775,,S | ||||
| 1193,2,"Malachard, Mr. Noel",male,,0,0,237735,15.0458,D,C | ||||
| 1194,2,"Phillips, Mr. Escott Robert",male,43,0,1,S.O./P.P. 2,21,,S | ||||
| 1195,3,"Pokrnic, Mr. Tome",male,24,0,0,315092,8.6625,,S | ||||
| 1196,3,"McCarthy, Miss. Catherine Katie""""",female,,0,0,383123,7.75,,Q | ||||
| 1197,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabeth Halstead)",female,64,1,1,112901,26.55,B26,S | ||||
| 1198,1,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,151.55,C22 C26,S | ||||
| 1199,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S | ||||
| 1200,1,"Hays, Mr. Charles Melville",male,55,1,1,12749,93.5,B69,S | ||||
| 1201,3,"Hansen, Mrs. Claus Peter (Jennie L Howard)",female,45,1,0,350026,14.1083,,S | ||||
| 1202,3,"Cacic, Mr. Jego Grga",male,18,0,0,315091,8.6625,,S | ||||
| 1203,3,"Vartanian, Mr. David",male,22,0,0,2658,7.225,,C | ||||
| 1204,3,"Sadowitz, Mr. Harry",male,,0,0,LP 1588,7.575,,S | ||||
| 1205,3,"Carr, Miss. Jeannie",female,37,0,0,368364,7.75,,Q | ||||
| 1206,1,"White, Mrs. John Stuart (Ella Holmes)",female,55,0,0,PC 17760,135.6333,C32,C | ||||
| 1207,3,"Hagardon, Miss. Kate",female,17,0,0,AQ/3. 30631,7.7333,,Q | ||||
| 1208,1,"Spencer, Mr. William Augustus",male,57,1,0,PC 17569,146.5208,B78,C | ||||
| 1209,2,"Rogers, Mr. Reginald Harry",male,19,0,0,28004,10.5,,S | ||||
| 1210,3,"Jonsson, Mr. Nils Hilding",male,27,0,0,350408,7.8542,,S | ||||
| 1211,2,"Jefferys, Mr. Ernest Wilfred",male,22,2,0,C.A. 31029,31.5,,S | ||||
| 1212,3,"Andersson, Mr. Johan Samuel",male,26,0,0,347075,7.775,,S | ||||
| 1213,3,"Krekorian, Mr. Neshan",male,25,0,0,2654,7.2292,F E57,C | ||||
| 1214,2,"Nesson, Mr. Israel",male,26,0,0,244368,13,F2,S | ||||
| 1215,1,"Rowe, Mr. Alfred G",male,33,0,0,113790,26.55,,S | ||||
| 1216,1,"Kreuchen, Miss. Emilie",female,39,0,0,24160,211.3375,,S | ||||
| 1217,3,"Assam, Mr. Ali",male,23,0,0,SOTON/O.Q. 3101309,7.05,,S | ||||
| 1218,2,"Becker, Miss. Ruth Elizabeth",female,12,2,1,230136,39,F4,S | ||||
| 1219,1,"Rosenshine, Mr. George (Mr George Thorne"")""",male,46,0,0,PC 17585,79.2,,C | ||||
| 1220,2,"Clarke, Mr. Charles Valentine",male,29,1,0,2003,26,,S | ||||
| 1221,2,"Enander, Mr. Ingvar",male,21,0,0,236854,13,,S | ||||
| 1222,2,"Davies, Mrs. John Morgan (Elizabeth Agnes Mary White) ",female,48,0,2,C.A. 33112,36.75,,S | ||||
| 1223,1,"Dulles, Mr. William Crothers",male,39,0,0,PC 17580,29.7,A18,C | ||||
| 1224,3,"Thomas, Mr. Tannous",male,,0,0,2684,7.225,,C | ||||
| 1225,3,"Nakid, Mrs. Said (Waika Mary"" Mowad)""",female,19,1,1,2653,15.7417,,C | ||||
| 1226,3,"Cor, Mr. Ivan",male,27,0,0,349229,7.8958,,S | ||||
| 1227,1,"Maguire, Mr. John Edward",male,30,0,0,110469,26,C106,S | ||||
| 1228,2,"de Brito, Mr. Jose Joaquim",male,32,0,0,244360,13,,S | ||||
| 1229,3,"Elias, Mr. Joseph",male,39,0,2,2675,7.2292,,C | ||||
| 1230,2,"Denbury, Mr. Herbert",male,25,0,0,C.A. 31029,31.5,,S | ||||
| 1231,3,"Betros, Master. Seman",male,,0,0,2622,7.2292,,C | ||||
| 1232,2,"Fillbrook, Mr. Joseph Charles",male,18,0,0,C.A. 15185,10.5,,S | ||||
| 1233,3,"Lundstrom, Mr. Thure Edvin",male,32,0,0,350403,7.5792,,S | ||||
| 1234,3,"Sage, Mr. John George",male,,1,9,CA. 2343,69.55,,S | ||||
| 1235,1,"Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake)",female,58,0,1,PC 17755,512.3292,B51 B53 B55,C | ||||
| 1236,3,"van Billiard, Master. James William",male,,1,1,A/5. 851,14.5,,S | ||||
| 1237,3,"Abelseth, Miss. Karen Marie",female,16,0,0,348125,7.65,,S | ||||
| 1238,2,"Botsford, Mr. William Hull",male,26,0,0,237670,13,,S | ||||
| 1239,3,"Whabee, Mrs. George Joseph (Shawneene Abi-Saab)",female,38,0,0,2688,7.2292,,C | ||||
| 1240,2,"Giles, Mr. Ralph",male,24,0,0,248726,13.5,,S | ||||
| 1241,2,"Walcroft, Miss. Nellie",female,31,0,0,F.C.C. 13528,21,,S | ||||
| 1242,1,"Greenfield, Mrs. Leo David (Blanche Strouse)",female,45,0,1,PC 17759,63.3583,D10 D12,C | ||||
| 1243,2,"Stokes, Mr. Philip Joseph",male,25,0,0,F.C.C. 13540,10.5,,S | ||||
| 1244,2,"Dibden, Mr. William",male,18,0,0,S.O.C. 14879,73.5,,S | ||||
| 1245,2,"Herman, Mr. Samuel",male,49,1,2,220845,65,,S | ||||
| 1246,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S | ||||
| 1247,1,"Julian, Mr. Henry Forbes",male,50,0,0,113044,26,E60,S | ||||
| 1248,1,"Brown, Mrs. John Murray (Caroline Lane Lamson)",female,59,2,0,11769,51.4792,C101,S | ||||
| 1249,3,"Lockyer, Mr. Edward",male,,0,0,1222,7.8792,,S | ||||
| 1250,3,"O'Keefe, Mr. Patrick",male,,0,0,368402,7.75,,Q | ||||
| 1251,3,"Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson)",female,30,1,0,349910,15.55,,S | ||||
| 1252,3,"Sage, Master. William Henry",male,14.5,8,2,CA. 2343,69.55,,S | ||||
| 1253,2,"Mallet, Mrs. Albert (Antoinette Magnin)",female,24,1,1,S.C./PARIS 2079,37.0042,,C | ||||
| 1254,2,"Ware, Mrs. John James (Florence Louise Long)",female,31,0,0,CA 31352,21,,S | ||||
| 1255,3,"Strilic, Mr. Ivan",male,27,0,0,315083,8.6625,,S | ||||
| 1256,1,"Harder, Mrs. George Achilles (Dorothy Annan)",female,25,1,0,11765,55.4417,E50,C | ||||
| 1257,3,"Sage, Mrs. John (Annie Bullen)",female,,1,9,CA. 2343,69.55,,S | ||||
| 1258,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C | ||||
| 1259,3,"Riihivouri, Miss. Susanna Juhantytar Sanni""""",female,22,0,0,3101295,39.6875,,S | ||||
| 1260,1,"Gibson, Mrs. Leonard (Pauline C Boeson)",female,45,0,1,112378,59.4,,C | ||||
| 1261,2,"Pallas y Castello, Mr. Emilio",male,29,0,0,SC/PARIS 2147,13.8583,,C | ||||
| 1262,2,"Giles, Mr. Edgar",male,21,1,0,28133,11.5,,S | ||||
| 1263,1,"Wilson, Miss. Helen Alice",female,31,0,0,16966,134.5,E39 E41,C | ||||
| 1264,1,"Ismay, Mr. Joseph Bruce",male,49,0,0,112058,0,B52 B54 B56,S | ||||
| 1265,2,"Harbeck, Mr. William H",male,44,0,0,248746,13,,S | ||||
| 1266,1,"Dodge, Mrs. Washington (Ruth Vidaver)",female,54,1,1,33638,81.8583,A34,S | ||||
| 1267,1,"Bowen, Miss. Grace Scott",female,45,0,0,PC 17608,262.375,,C | ||||
| 1268,3,"Kink, Miss. Maria",female,22,2,0,315152,8.6625,,S | ||||
| 1269,2,"Cotterill, Mr. Henry Harry""""",male,21,0,0,29107,11.5,,S | ||||
| 1270,1,"Hipkins, Mr. William Edward",male,55,0,0,680,50,C39,S | ||||
| 1271,3,"Asplund, Master. Carl Edgar",male,5,4,2,347077,31.3875,,S | ||||
| 1272,3,"O'Connor, Mr. Patrick",male,,0,0,366713,7.75,,Q | ||||
| 1273,3,"Foley, Mr. Joseph",male,26,0,0,330910,7.8792,,Q | ||||
| 1274,3,"Risien, Mrs. Samuel (Emma)",female,,0,0,364498,14.5,,S | ||||
| 1275,3,"McNamee, Mrs. Neal (Eileen O'Leary)",female,19,1,0,376566,16.1,,S | ||||
| 1276,2,"Wheeler, Mr. Edwin Frederick""""",male,,0,0,SC/PARIS 2159,12.875,,S | ||||
| 1277,2,"Herman, Miss. Kate",female,24,1,2,220845,65,,S | ||||
| 1278,3,"Aronsson, Mr. Ernst Axel Algot",male,24,0,0,349911,7.775,,S | ||||
| 1279,2,"Ashby, Mr. John",male,57,0,0,244346,13,,S | ||||
| 1280,3,"Canavan, Mr. Patrick",male,21,0,0,364858,7.75,,Q | ||||
| 1281,3,"Palsson, Master. Paul Folke",male,6,3,1,349909,21.075,,S | ||||
| 1282,1,"Payne, Mr. Vivian Ponsonby",male,23,0,0,12749,93.5,B24,S | ||||
| 1283,1,"Lines, Mrs. Ernest H (Elizabeth Lindsey James)",female,51,0,1,PC 17592,39.4,D28,S | ||||
| 1284,3,"Abbott, Master. Eugene Joseph",male,13,0,2,C.A. 2673,20.25,,S | ||||
| 1285,2,"Gilbert, Mr. William",male,47,0,0,C.A. 30769,10.5,,S | ||||
| 1286,3,"Kink-Heilmann, Mr. Anton",male,29,3,1,315153,22.025,,S | ||||
| 1287,1,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",female,18,1,0,13695,60,C31,S | ||||
| 1288,3,"Colbert, Mr. Patrick",male,24,0,0,371109,7.25,,Q | ||||
| 1289,1,"Frolicher-Stehli, Mrs. Maxmillian (Margaretha Emerentia Stehli)",female,48,1,1,13567,79.2,B41,C | ||||
| 1290,3,"Larsson-Rondberg, Mr. Edvard A",male,22,0,0,347065,7.775,,S | ||||
| 1291,3,"Conlon, Mr. Thomas Henry",male,31,0,0,21332,7.7333,,Q | ||||
| 1292,1,"Bonnell, Miss. Caroline",female,30,0,0,36928,164.8667,C7,S | ||||
| 1293,2,"Gale, Mr. Harry",male,38,1,0,28664,21,,S | ||||
| 1294,1,"Gibson, Miss. Dorothy Winifred",female,22,0,1,112378,59.4,,C | ||||
| 1295,1,"Carrau, Mr. Jose Pedro",male,17,0,0,113059,47.1,,S | ||||
| 1296,1,"Frauenthal, Mr. Isaac Gerald",male,43,1,0,17765,27.7208,D40,C | ||||
| 1297,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20,0,0,SC/PARIS 2166,13.8625,D38,C | ||||
| 1298,2,"Ware, Mr. William Jeffery",male,23,1,0,28666,10.5,,S | ||||
| 1299,1,"Widener, Mr. George Dunton",male,50,1,1,113503,211.5,C80,C | ||||
| 1300,3,"Riordan, Miss. Johanna Hannah""""",female,,0,0,334915,7.7208,,Q | ||||
| 1301,3,"Peacock, Miss. Treasteall",female,3,1,1,SOTON/O.Q. 3101315,13.775,,S | ||||
| 1302,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.75,,Q | ||||
| 1303,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37,1,0,19928,90,C78,Q | ||||
| 1304,3,"Henriksson, Miss. Jenny Lovisa",female,28,0,0,347086,7.775,,S | ||||
| 1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S | ||||
| 1306,1,"Oliva y Ocana, Dona. Fermina",female,39,0,0,PC 17758,108.9,C105,C | ||||
| 1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S | ||||
| 1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S | ||||
| 1309,3,"Peter, Master. Michael J",male,,1,1,2668,22.3583,,C | ||||
| 
 | 
							
								
								
									
										
											BIN
										
									
								
								ml2/images/EscUpmPolit_p.gif
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								ml2/images/EscUpmPolit_p.gif
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 3.1 KiB | 
							
								
								
									
										
											BIN
										
									
								
								ml2/images/machine-learning-process.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								ml2/images/machine-learning-process.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 237 KiB | 
							
								
								
									
										
											BIN
										
									
								
								ml2/images/titanic.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								ml2/images/titanic.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 152 KiB | 
							
								
								
									
										109
									
								
								ml2/plot_learning_curve.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										109
									
								
								ml2/plot_learning_curve.py
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,109 @@ | ||||
| """ | ||||
| Taken from http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html | ||||
|  | ||||
| ======================== | ||||
| Plotting Learning Curves | ||||
| ======================== | ||||
|  | ||||
| On the left side the learning curve of a naive Bayes classifier is shown for | ||||
| the digits dataset. Note that the training score and the cross-validation score | ||||
| are both not very good at the end. However, the shape of the curve can be found | ||||
| in more complex datasets very often: the training score is very high at the | ||||
| beginning and decreases and the cross-validation score is very low at the | ||||
| beginning and increases. On the right side we see the learning curve of an SVM | ||||
| with RBF kernel. We can see clearly that the training score is still around | ||||
| the maximum and the validation score could be increased with more training | ||||
| samples. | ||||
| """ | ||||
| #print(__doc__) | ||||
|  | ||||
| import numpy as np | ||||
| import matplotlib.pyplot as plt | ||||
| from sklearn import cross_validation | ||||
| from sklearn.naive_bayes import GaussianNB | ||||
| from sklearn.svm import SVC | ||||
| from sklearn.datasets import load_digits | ||||
| from sklearn.learning_curve import learning_curve | ||||
|  | ||||
|  | ||||
| def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, | ||||
|                         n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5)): | ||||
|     """ | ||||
|     Generate a simple plot of the test and traning learning curve. | ||||
|  | ||||
|     Parameters | ||||
|     ---------- | ||||
|     estimator : object type that implements the "fit" and "predict" methods | ||||
|         An object of that type which is cloned for each validation. | ||||
|  | ||||
|     title : string | ||||
|         Title for the chart. | ||||
|  | ||||
|     X : array-like, shape (n_samples, n_features) | ||||
|         Training vector, where n_samples is the number of samples and | ||||
|         n_features is the number of features. | ||||
|  | ||||
|     y : array-like, shape (n_samples) or (n_samples, n_features), optional | ||||
|         Target relative to X for classification or regression; | ||||
|         None for unsupervised learning. | ||||
|  | ||||
|     ylim : tuple, shape (ymin, ymax), optional | ||||
|         Defines minimum and maximum yvalues plotted. | ||||
|  | ||||
|     cv : integer, cross-validation generator, optional | ||||
|         If an integer is passed, it is the number of folds (defaults to 3). | ||||
|         Specific cross-validation objects can be passed, see | ||||
|         sklearn.cross_validation module for the list of possible objects | ||||
|  | ||||
|     n_jobs : integer, optional | ||||
|         Number of jobs to run in parallel (default 1). | ||||
|     """ | ||||
|     plt.figure() | ||||
|     plt.title(title) | ||||
|     if ylim is not None: | ||||
|         plt.ylim(*ylim) | ||||
|     plt.xlabel("Training examples") | ||||
|     plt.ylabel("Score") | ||||
|     train_sizes, train_scores, test_scores = learning_curve( | ||||
|         estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes) | ||||
|     train_scores_mean = np.mean(train_scores, axis=1) | ||||
|     train_scores_std = np.std(train_scores, axis=1) | ||||
|     test_scores_mean = np.mean(test_scores, axis=1) | ||||
|     test_scores_std = np.std(test_scores, axis=1) | ||||
|     plt.grid() | ||||
|  | ||||
|     plt.fill_between(train_sizes, train_scores_mean - train_scores_std, | ||||
|                      train_scores_mean + train_scores_std, alpha=0.1, | ||||
|                      color="r") | ||||
|     plt.fill_between(train_sizes, test_scores_mean - test_scores_std, | ||||
|                      test_scores_mean + test_scores_std, alpha=0.1, color="g") | ||||
|     plt.plot(train_sizes, train_scores_mean, 'o-', color="r", | ||||
|              label="Training score") | ||||
|     plt.plot(train_sizes, test_scores_mean, 'o-', color="g", | ||||
|              label="Cross-validation score") | ||||
|  | ||||
|     plt.legend(loc="best") | ||||
|     return plt | ||||
|  | ||||
|  | ||||
| #digits = load_digits() | ||||
| #X, y = digits.data, digits.target | ||||
|  | ||||
|  | ||||
| #title = "Learning Curves (Naive Bayes)" | ||||
| # Cross validation with 100 iterations to get smoother mean test and train | ||||
| # score curves, each time with 20% data randomly selected as a validation set. | ||||
| #cv = cross_validation.ShuffleSplit(digits.data.shape[0], n_iter=100, | ||||
| #                                   test_size=0.2, random_state=0) | ||||
|  | ||||
| #estimator = GaussianNB() | ||||
| #plot_learning_curve(estimator, title, X, y, ylim=(0.7, 1.01), cv=cv, n_jobs=4) | ||||
|  | ||||
| #title = "Learning Curves (SVM, RBF kernel, $\gamma=0.001$)" | ||||
| # SVC is more expensive so we do a lower number of CV iterations: | ||||
| #cv = cross_validation.ShuffleSplit(digits.data.shape[0], n_iter=10, | ||||
| #	                                   test_size=0.2, random_state=0) | ||||
| #estimator = SVC(gamma=0.001) | ||||
| #plot_learning_curve(estimator, title, X, y, (0.7, 1.01), cv=cv, n_jobs=4) | ||||
|  | ||||
| #plt.show() | ||||
							
								
								
									
										80
									
								
								ml2/plot_svm.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										80
									
								
								ml2/plot_svm.py
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,80 @@ | ||||
| from patsy import dmatrices | ||||
| import matplotlib.pyplot as plt | ||||
| import numpy as np | ||||
| from sklearn import svm | ||||
|  | ||||
| #Taken from http://nbviewer.jupyter.org/github/agconti/kaggle-titanic/blob/master/Titanic.ipynb | ||||
|  | ||||
| def plot_svm(df): | ||||
| 	# set plotting parameters | ||||
| 	plt.figure(figsize=(8,6)) | ||||
|  | ||||
|         # # Create an acceptable formula for our machine learning algorithms | ||||
| 	formula_ml = 'Survived ~ C(Pclass) + C(Sex) + Age + SibSp + Parch + C(Embarked)' | ||||
| 	# create a regression friendly data frame | ||||
| 	y, x = dmatrices(formula_ml, data=df, return_type='matrix') | ||||
|  | ||||
| 	# select which features we would like to analyze | ||||
| 	# try chaning the selection here for diffrent output. | ||||
| 	# Choose : [2,3] - pretty sweet DBs [3,1] --standard DBs [7,3] -very cool DBs, | ||||
| 	# [3,6] -- very long complex dbs, could take over an hour to calculate!  | ||||
| 	feature_1 = 2 | ||||
| 	feature_2 = 3 | ||||
|  | ||||
| 	X = np.asarray(x) | ||||
| 	X = X[:,[feature_1, feature_2]]   | ||||
|  | ||||
|  | ||||
| 	y = np.asarray(y) | ||||
| 	# needs to be 1 dimensional so we flatten. it comes out of dmatrices with a shape.  | ||||
| 	y = y.flatten()       | ||||
|  | ||||
| 	n_sample = len(X) | ||||
|  | ||||
| 	np.random.seed(0) | ||||
| 	order = np.random.permutation(n_sample) | ||||
|  | ||||
| 	X = X[order] | ||||
| 	y = y[order].astype(np.float) | ||||
|  | ||||
| 	# do a cross validation | ||||
| 	nighty_precent_of_sample = int(.9 * n_sample) | ||||
| 	X_train = X[:nighty_precent_of_sample] | ||||
| 	y_train = y[:nighty_precent_of_sample] | ||||
| 	X_test = X[nighty_precent_of_sample:] | ||||
| 	y_test = y[nighty_precent_of_sample:] | ||||
|  | ||||
| 	# create a list of the types of kerneks we will use for your analysis | ||||
| 	types_of_kernels = ['linear', 'rbf', 'poly'] | ||||
|  | ||||
| 	# specify our color map for plotting the results | ||||
| 	color_map = plt.cm.RdBu_r | ||||
|  | ||||
| 	# fit the model | ||||
| 	for fig_num, kernel in enumerate(types_of_kernels): | ||||
|     		clf = svm.SVC(kernel=kernel, gamma=3) | ||||
|     		clf.fit(X_train, y_train) | ||||
|  | ||||
|     		plt.figure(fig_num) | ||||
|     		plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=color_map) | ||||
|  | ||||
|     		# circle out the test data | ||||
|     		plt.scatter(X_test[:, 0], X_test[:, 1], s=80, facecolors='none', zorder=10) | ||||
|      | ||||
|     		plt.axis('tight') | ||||
|    	 	x_min = X[:, 0].min() | ||||
|     		x_max = X[:, 0].max() | ||||
|     		y_min = X[:, 1].min() | ||||
|     		y_max = X[:, 1].max() | ||||
|  | ||||
|     		XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j] | ||||
|     		Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()]) | ||||
|  | ||||
|     		# put the result into a color plot | ||||
|     		Z = Z.reshape(XX.shape) | ||||
|     		plt.pcolormesh(XX, YY, Z > 0, cmap=color_map) | ||||
|     		plt.contour(XX, YY, Z, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'], | ||||
|                		levels=[-.5, 0, .5]) | ||||
|  | ||||
|     		plt.title(kernel) | ||||
|     		plt.show() | ||||
		Reference in New Issue
	
	Block a user