{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![](images/EscUpmPolit_p.gif \"UPM\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Course Notes for Learning Intelligent Systems" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Introduction to Machine Learning II](3_0_0_Intro_ML_2.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table of Contents\n", "\n", "* [The Titanic dataset](#The-Titanic-dataset)\n", "* [Reading Data](#Reading-Data)\n", "* [Reading Data from a File](#Reading-Data-from-a-File)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Titanic dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this session we will work with the Titanic dataset. This dataset is provided by [Kaggle](http://www.kaggle.com). Kaggle is a crowdsourcing platform that organizes competitions where researchers and companies post their data and users compete to obtain the best models.\n", "\n", "![Titanic](images/titanic.jpg)\n", "\n", "\n", "The main objective is predicting which passengers survived the sinking of the Titanic.\n", "\n", "The data is available [here](https://www.kaggle.com/c/titanic/data). There are two files, one for training ([train.csv](files/data-titanic/train.csv)) and another file for testing [test.csv](files/data-titanic/test.csv). A local copy has been included in this notebook under the folder *data-titanic*.\n", "\n", "\n", "Here follows a description of the variables.\n", "\n", "|Variable | Description| Values|\n", "|-------------------------------|\n", "| survival| Survival| (0 = No; 1 = Yes)|\n", "|Pclass |Name | |\n", "|Sex |Sex | male, female|\n", "|Age |Age|\n", "|SibSp |Number of Siblings/Spouses Aboard||\n", "|Parch |Number of Parents/Children Aboard||\n", "|Ticket|Ticket Number||\n", "|Fare |Passenger Fare||\n", "|Cabin |Cabin||\n", "|Embarked |Port of Embarkation| (C = Cherbourg; Q = Queenstown; S = Southampton)|\n", "\n", "\n", "The definitions used for SibSp and Parch are:\n", "* *Sibling*: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic\n", "* *Spouse*: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)\n", "* *Parent*: Mother or Father of Passenger Aboard Titanic\n", "* *Child*: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous dataset we load a bundle dataset in scikit-learn. In this notebook we are going to learn how to read from a file or a url using the Pandas library." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading Data from a File" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
131403Andersson, Mr. Anders Johanmale39.01534708231.2750NaNS
141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
161703Rice, Master. Eugenemale2.04138265229.1250NaNQ
171812Williams, Mr. Charles EugenemaleNaN0024437313.0000NaNS
181903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.01034576318.0000NaNS
192013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC
202102Fynney, Mr. Joseph Jmale35.00023986526.0000NaNS
212212Beesley, Mr. Lawrencemale34.00024869813.0000D56S
222313McGowan, Miss. Anna \"Annie\"female15.0003309238.0292NaNQ
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
242503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NaNS
252613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.01534707731.3875NaNS
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
272801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S
282913O'Dwyer, Miss. Ellen \"Nellie\"femaleNaN003309597.8792NaNQ
293003Todoroff, Mr. LaliomaleNaN003492167.8958NaNS
.......................................
86186202Giles, Mr. Frederick Edwardmale21.0102813411.5000NaNS
86286311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S
86386403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
86686712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NaNC
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
86886903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
86987013Johnson, Master. Harold Theodormale4.01134774211.1333NaNS
87087103Balkic, Mr. Cerinmale26.0003492487.8958NaNS
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
87387403Vander Cruyssen, Mr. Victormale47.0003457659.0000NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
87787803Petroff, Mr. Nedeliomale19.0003492127.8958NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNS
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "5 6 0 3 \n", "6 7 0 1 \n", "7 8 0 3 \n", "8 9 1 3 \n", "9 10 1 2 \n", "10 11 1 3 \n", "11 12 1 1 \n", "12 13 0 3 \n", "13 14 0 3 \n", "14 15 0 3 \n", "15 16 1 2 \n", "16 17 0 3 \n", "17 18 1 2 \n", "18 19 0 3 \n", "19 20 1 3 \n", "20 21 0 2 \n", "21 22 1 2 \n", "22 23 1 3 \n", "23 24 1 1 \n", "24 25 0 3 \n", "25 26 1 3 \n", "26 27 0 3 \n", "27 28 0 1 \n", "28 29 1 3 \n", "29 30 0 3 \n", ".. ... ... ... \n", "861 862 0 2 \n", "862 863 1 1 \n", "863 864 0 3 \n", "864 865 0 2 \n", "865 866 1 2 \n", "866 867 1 2 \n", "867 868 0 1 \n", "868 869 0 3 \n", "869 870 1 3 \n", "870 871 0 3 \n", "871 872 1 1 \n", "872 873 0 1 \n", "873 874 0 3 \n", "874 875 1 2 \n", "875 876 1 3 \n", "876 877 0 3 \n", "877 878 0 3 \n", "878 879 0 3 \n", "879 880 1 1 \n", "880 881 1 2 \n", "881 882 0 3 \n", "882 883 0 3 \n", "883 884 0 2 \n", "884 885 0 3 \n", "885 886 0 3 \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "5 Moran, Mr. James male NaN 0 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "7 Palsson, Master. Gosta Leonard male 2.0 3 \n", "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n", "9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n", "10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n", "11 Bonnell, Miss. Elizabeth female 58.0 0 \n", "12 Saundercock, Mr. William Henry male 20.0 0 \n", "13 Andersson, Mr. Anders Johan male 39.0 1 \n", "14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n", "15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n", "16 Rice, Master. Eugene male 2.0 4 \n", "17 Williams, Mr. Charles Eugene male NaN 0 \n", "18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n", "19 Masselmani, Mrs. Fatima female NaN 0 \n", "20 Fynney, Mr. Joseph J male 35.0 0 \n", "21 Beesley, Mr. Lawrence male 34.0 0 \n", "22 McGowan, Miss. Anna \"Annie\" female 15.0 0 \n", "23 Sloper, Mr. William Thompson male 28.0 0 \n", "24 Palsson, Miss. Torborg Danira female 8.0 3 \n", "25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.0 1 \n", "26 Emir, Mr. Farred Chehab male NaN 0 \n", "27 Fortune, Mr. Charles Alexander male 19.0 3 \n", "28 O'Dwyer, Miss. Ellen \"Nellie\" female NaN 0 \n", "29 Todoroff, Mr. Lalio male NaN 0 \n", ".. ... ... ... ... \n", "861 Giles, Mr. Frederick Edward male 21.0 1 \n", "862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 \n", "863 Sage, Miss. Dorothy Edith \"Dolly\" female NaN 8 \n", "864 Gill, Mr. John William male 24.0 0 \n", "865 Bystrom, Mrs. (Karolina) female 42.0 0 \n", "866 Duran y More, Miss. Asuncion female 27.0 1 \n", "867 Roebling, Mr. Washington Augustus II male 31.0 0 \n", "868 van Melkebeke, Mr. Philemon male NaN 0 \n", "869 Johnson, Master. Harold Theodor male 4.0 1 \n", "870 Balkic, Mr. Cerin male 26.0 0 \n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n", "872 Carlsson, Mr. Frans Olof male 33.0 0 \n", "873 Vander Cruyssen, Mr. Victor male 47.0 0 \n", "874 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 \n", "875 Najib, Miss. Adele Kiamie \"Jane\" female 15.0 0 \n", "876 Gustafsson, Mr. Alfred Ossian male 20.0 0 \n", "877 Petroff, Mr. Nedelio male 19.0 0 \n", "878 Laleff, Mr. Kristo male NaN 0 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n", "881 Markun, Mr. Johann male 33.0 0 \n", "882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n", "883 Banfield, Mr. Frederick James male 28.0 0 \n", "884 Sutehall, Mr. Henry Jr male 25.0 0 \n", "885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", "5 0 330877 8.4583 NaN Q \n", "6 0 17463 51.8625 E46 S \n", "7 1 349909 21.0750 NaN S \n", "8 2 347742 11.1333 NaN S \n", "9 0 237736 30.0708 NaN C \n", "10 1 PP 9549 16.7000 G6 S \n", "11 0 113783 26.5500 C103 S \n", "12 0 A/5. 2151 8.0500 NaN S \n", "13 5 347082 31.2750 NaN S \n", "14 0 350406 7.8542 NaN S \n", "15 0 248706 16.0000 NaN S \n", "16 1 382652 29.1250 NaN Q \n", "17 0 244373 13.0000 NaN S \n", "18 0 345763 18.0000 NaN S \n", "19 0 2649 7.2250 NaN C \n", "20 0 239865 26.0000 NaN S \n", "21 0 248698 13.0000 D56 S \n", "22 0 330923 8.0292 NaN Q \n", "23 0 113788 35.5000 A6 S \n", "24 1 349909 21.0750 NaN S \n", "25 5 347077 31.3875 NaN S \n", "26 0 2631 7.2250 NaN C \n", "27 2 19950 263.0000 C23 C25 C27 S \n", "28 0 330959 7.8792 NaN Q \n", "29 0 349216 7.8958 NaN S \n", ".. ... ... ... ... ... \n", "861 0 28134 11.5000 NaN S \n", "862 0 17466 25.9292 D17 S \n", "863 2 CA. 2343 69.5500 NaN S \n", "864 0 233866 13.0000 NaN S \n", "865 0 236852 13.0000 NaN S \n", "866 0 SC/PARIS 2149 13.8583 NaN C \n", "867 0 PC 17590 50.4958 A24 S \n", "868 0 345777 9.5000 NaN S \n", "869 1 347742 11.1333 NaN S \n", "870 0 349248 7.8958 NaN S \n", "871 1 11751 52.5542 D35 S \n", "872 0 695 5.0000 B51 B53 B55 S \n", "873 0 345765 9.0000 NaN S \n", "874 0 P/PP 3381 24.0000 NaN C \n", "875 0 2667 7.2250 NaN C \n", "876 0 7534 9.8458 NaN S \n", "877 0 349212 7.8958 NaN S \n", "878 0 349217 7.8958 NaN S \n", "879 1 11767 83.1583 C50 C \n", "880 1 230433 26.0000 NaN S \n", "881 0 349257 7.8958 NaN S \n", "882 0 7552 10.5167 NaN S \n", "883 0 C.A./SOTON 34068 10.5000 NaN S \n", "884 0 SOTON/OQ 392076 7.0500 NaN S \n", "885 5 382652 29.1250 NaN Q \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "from pandas import Series, DataFrame\n", "\n", "df = pd.read_csv('data-titanic/train.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(891, 12)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can get the number of samples and features\n", "df.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale2210A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female3810PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale2600STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female351011380353.1000C123S
4503Allen, Mr. William Henrymale35003734508.0500NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 \n", "2 Heikkinen, Miss. Laina female 26 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 \n", "4 Allen, Mr. William Henry male 35 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#I can read only a number of rows and tell where the header is, among other options.\n", "df = df = pd.read_csv('data-titanic/train.csv', header=0, nrows=5)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas provides methods for reading other formats, such as Excel (*read_excel()*), JSON (*read_json()*), or HTML (*read_html()*), look at the [documentation](http://pandas.pydata.org/pandas-docs/stable/api.html#input-output) for more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading data from a URL" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
131403Andersson, Mr. Anders Johanmale39.01534708231.2750NaNS
141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
161703Rice, Master. Eugenemale2.04138265229.1250NaNQ
171812Williams, Mr. Charles EugenemaleNaN0024437313.0000NaNS
181903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.01034576318.0000NaNS
192013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC
202102Fynney, Mr. Joseph Jmale35.00023986526.0000NaNS
212212Beesley, Mr. Lawrencemale34.00024869813.0000D56S
222313McGowan, Miss. Anna \"Annie\"female15.0003309238.0292NaNQ
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
242503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NaNS
252613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.01534707731.3875NaNS
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
272801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S
282913O'Dwyer, Miss. Ellen \"Nellie\"femaleNaN003309597.8792NaNQ
293003Todoroff, Mr. LaliomaleNaN003492167.8958NaNS
.......................................
86186202Giles, Mr. Frederick Edwardmale21.0102813411.5000NaNS
86286311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S
86386403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
86686712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NaNC
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
86886903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
86987013Johnson, Master. Harold Theodormale4.01134774211.1333NaNS
87087103Balkic, Mr. Cerinmale26.0003492487.8958NaNS
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
87387403Vander Cruyssen, Mr. Victormale47.0003457659.0000NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
87787803Petroff, Mr. Nedeliomale19.0003492127.8958NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNS
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "5 6 0 3 \n", "6 7 0 1 \n", "7 8 0 3 \n", "8 9 1 3 \n", "9 10 1 2 \n", "10 11 1 3 \n", "11 12 1 1 \n", "12 13 0 3 \n", "13 14 0 3 \n", "14 15 0 3 \n", "15 16 1 2 \n", "16 17 0 3 \n", "17 18 1 2 \n", "18 19 0 3 \n", "19 20 1 3 \n", "20 21 0 2 \n", "21 22 1 2 \n", "22 23 1 3 \n", "23 24 1 1 \n", "24 25 0 3 \n", "25 26 1 3 \n", "26 27 0 3 \n", "27 28 0 1 \n", "28 29 1 3 \n", "29 30 0 3 \n", ".. ... ... ... \n", "861 862 0 2 \n", "862 863 1 1 \n", "863 864 0 3 \n", "864 865 0 2 \n", "865 866 1 2 \n", "866 867 1 2 \n", "867 868 0 1 \n", "868 869 0 3 \n", "869 870 1 3 \n", "870 871 0 3 \n", "871 872 1 1 \n", "872 873 0 1 \n", "873 874 0 3 \n", "874 875 1 2 \n", "875 876 1 3 \n", "876 877 0 3 \n", "877 878 0 3 \n", "878 879 0 3 \n", "879 880 1 1 \n", "880 881 1 2 \n", "881 882 0 3 \n", "882 883 0 3 \n", "883 884 0 2 \n", "884 885 0 3 \n", "885 886 0 3 \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "5 Moran, Mr. James male NaN 0 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "7 Palsson, Master. Gosta Leonard male 2.0 3 \n", "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n", "9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n", "10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n", "11 Bonnell, Miss. Elizabeth female 58.0 0 \n", "12 Saundercock, Mr. William Henry male 20.0 0 \n", "13 Andersson, Mr. Anders Johan male 39.0 1 \n", "14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n", "15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n", "16 Rice, Master. Eugene male 2.0 4 \n", "17 Williams, Mr. Charles Eugene male NaN 0 \n", "18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n", "19 Masselmani, Mrs. Fatima female NaN 0 \n", "20 Fynney, Mr. Joseph J male 35.0 0 \n", "21 Beesley, Mr. Lawrence male 34.0 0 \n", "22 McGowan, Miss. Anna \"Annie\" female 15.0 0 \n", "23 Sloper, Mr. William Thompson male 28.0 0 \n", "24 Palsson, Miss. Torborg Danira female 8.0 3 \n", "25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.0 1 \n", "26 Emir, Mr. Farred Chehab male NaN 0 \n", "27 Fortune, Mr. Charles Alexander male 19.0 3 \n", "28 O'Dwyer, Miss. Ellen \"Nellie\" female NaN 0 \n", "29 Todoroff, Mr. Lalio male NaN 0 \n", ".. ... ... ... ... \n", "861 Giles, Mr. Frederick Edward male 21.0 1 \n", "862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 \n", "863 Sage, Miss. Dorothy Edith \"Dolly\" female NaN 8 \n", "864 Gill, Mr. John William male 24.0 0 \n", "865 Bystrom, Mrs. (Karolina) female 42.0 0 \n", "866 Duran y More, Miss. Asuncion female 27.0 1 \n", "867 Roebling, Mr. Washington Augustus II male 31.0 0 \n", "868 van Melkebeke, Mr. Philemon male NaN 0 \n", "869 Johnson, Master. Harold Theodor male 4.0 1 \n", "870 Balkic, Mr. Cerin male 26.0 0 \n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n", "872 Carlsson, Mr. Frans Olof male 33.0 0 \n", "873 Vander Cruyssen, Mr. Victor male 47.0 0 \n", "874 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 \n", "875 Najib, Miss. Adele Kiamie \"Jane\" female 15.0 0 \n", "876 Gustafsson, Mr. Alfred Ossian male 20.0 0 \n", "877 Petroff, Mr. Nedelio male 19.0 0 \n", "878 Laleff, Mr. Kristo male NaN 0 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n", "881 Markun, Mr. Johann male 33.0 0 \n", "882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n", "883 Banfield, Mr. Frederick James male 28.0 0 \n", "884 Sutehall, Mr. Henry Jr male 25.0 0 \n", "885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", "5 0 330877 8.4583 NaN Q \n", "6 0 17463 51.8625 E46 S \n", "7 1 349909 21.0750 NaN S \n", "8 2 347742 11.1333 NaN S \n", "9 0 237736 30.0708 NaN C \n", "10 1 PP 9549 16.7000 G6 S \n", "11 0 113783 26.5500 C103 S \n", "12 0 A/5. 2151 8.0500 NaN S \n", "13 5 347082 31.2750 NaN S \n", "14 0 350406 7.8542 NaN S \n", "15 0 248706 16.0000 NaN S \n", "16 1 382652 29.1250 NaN Q \n", "17 0 244373 13.0000 NaN S \n", "18 0 345763 18.0000 NaN S \n", "19 0 2649 7.2250 NaN C \n", "20 0 239865 26.0000 NaN S \n", "21 0 248698 13.0000 D56 S \n", "22 0 330923 8.0292 NaN Q \n", "23 0 113788 35.5000 A6 S \n", "24 1 349909 21.0750 NaN S \n", "25 5 347077 31.3875 NaN S \n", "26 0 2631 7.2250 NaN C \n", "27 2 19950 263.0000 C23 C25 C27 S \n", "28 0 330959 7.8792 NaN Q \n", "29 0 349216 7.8958 NaN S \n", ".. ... ... ... ... ... \n", "861 0 28134 11.5000 NaN S \n", "862 0 17466 25.9292 D17 S \n", "863 2 CA. 2343 69.5500 NaN S \n", "864 0 233866 13.0000 NaN S \n", "865 0 236852 13.0000 NaN S \n", "866 0 SC/PARIS 2149 13.8583 NaN C \n", "867 0 PC 17590 50.4958 A24 S \n", "868 0 345777 9.5000 NaN S \n", "869 1 347742 11.1333 NaN S \n", "870 0 349248 7.8958 NaN S \n", "871 1 11751 52.5542 D35 S \n", "872 0 695 5.0000 B51 B53 B55 S \n", "873 0 345765 9.0000 NaN S \n", "874 0 P/PP 3381 24.0000 NaN C \n", "875 0 2667 7.2250 NaN C \n", "876 0 7534 9.8458 NaN S \n", "877 0 349212 7.8958 NaN S \n", "878 0 349217 7.8958 NaN S \n", "879 1 11767 83.1583 C50 C \n", "880 1 230433 26.0000 NaN S \n", "881 0 349257 7.8958 NaN S \n", "882 0 7552 10.5167 NaN S \n", "883 0 C.A./SOTON 34068 10.5000 NaN S \n", "884 0 SOTON/OQ 392076 7.0500 NaN S \n", "885 5 382652 29.1250 NaN Q \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "#We get a URL with raw content (not HTML one)\n", "url = \"https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv\"\n", "df = pd.read_csv(url)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternative option is reading the file with the library *requests* and then use *pandas*." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "b'PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked\\r\\n1,0,3,\"Braund, Mr. Owen Harris\",male,22,1,0,A/5 21171,7.25,,S\\r\\n2,1,1,\"Cumings, Mrs. John Bradley (Florence Briggs Thayer)\",female,38,1,0,PC 17599,71.2833,C85,C\\r\\n3,1,3,\"Heikkinen, Miss. Laina\",female,26,0,0,STON/O2. 3101282,7.925,,S\\r\\n4,1,1,'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# First we open the file\n", "import pandas as pd\n", "import io\n", "import requests\n", "url = \"https://raw.githubusercontent.com/gsi-upm/sitc/master/ml2/data-titanic/train.csv\"\n", "s = requests.get(url, stream=True).content\n", "#Print the first 320 characters for understanding how it works\n", "s[:320]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
131403Andersson, Mr. Anders Johanmale39.01534708231.2750NaNS
141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
161703Rice, Master. Eugenemale2.04138265229.1250NaNQ
171812Williams, Mr. Charles EugenemaleNaN0024437313.0000NaNS
181903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.01034576318.0000NaNS
192013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC
202102Fynney, Mr. Joseph Jmale35.00023986526.0000NaNS
212212Beesley, Mr. Lawrencemale34.00024869813.0000D56S
222313McGowan, Miss. Anna \"Annie\"female15.0003309238.0292NaNQ
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
242503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NaNS
252613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.01534707731.3875NaNS
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
272801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S
282913O'Dwyer, Miss. Ellen \"Nellie\"femaleNaN003309597.8792NaNQ
293003Todoroff, Mr. LaliomaleNaN003492167.8958NaNS
.......................................
86186202Giles, Mr. Frederick Edwardmale21.0102813411.5000NaNS
86286311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S
86386403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
86686712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NaNC
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
86886903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
86987013Johnson, Master. Harold Theodormale4.01134774211.1333NaNS
87087103Balkic, Mr. Cerinmale26.0003492487.8958NaNS
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
87387403Vander Cruyssen, Mr. Victormale47.0003457659.0000NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
87787803Petroff, Mr. Nedeliomale19.0003492127.8958NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNS
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "5 6 0 3 \n", "6 7 0 1 \n", "7 8 0 3 \n", "8 9 1 3 \n", "9 10 1 2 \n", "10 11 1 3 \n", "11 12 1 1 \n", "12 13 0 3 \n", "13 14 0 3 \n", "14 15 0 3 \n", "15 16 1 2 \n", "16 17 0 3 \n", "17 18 1 2 \n", "18 19 0 3 \n", "19 20 1 3 \n", "20 21 0 2 \n", "21 22 1 2 \n", "22 23 1 3 \n", "23 24 1 1 \n", "24 25 0 3 \n", "25 26 1 3 \n", "26 27 0 3 \n", "27 28 0 1 \n", "28 29 1 3 \n", "29 30 0 3 \n", ".. ... ... ... \n", "861 862 0 2 \n", "862 863 1 1 \n", "863 864 0 3 \n", "864 865 0 2 \n", "865 866 1 2 \n", "866 867 1 2 \n", "867 868 0 1 \n", "868 869 0 3 \n", "869 870 1 3 \n", "870 871 0 3 \n", "871 872 1 1 \n", "872 873 0 1 \n", "873 874 0 3 \n", "874 875 1 2 \n", "875 876 1 3 \n", "876 877 0 3 \n", "877 878 0 3 \n", "878 879 0 3 \n", "879 880 1 1 \n", "880 881 1 2 \n", "881 882 0 3 \n", "882 883 0 3 \n", "883 884 0 2 \n", "884 885 0 3 \n", "885 886 0 3 \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "5 Moran, Mr. James male NaN 0 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "7 Palsson, Master. Gosta Leonard male 2.0 3 \n", "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n", "9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n", "10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n", "11 Bonnell, Miss. Elizabeth female 58.0 0 \n", "12 Saundercock, Mr. William Henry male 20.0 0 \n", "13 Andersson, Mr. Anders Johan male 39.0 1 \n", "14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n", "15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n", "16 Rice, Master. Eugene male 2.0 4 \n", "17 Williams, Mr. Charles Eugene male NaN 0 \n", "18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n", "19 Masselmani, Mrs. Fatima female NaN 0 \n", "20 Fynney, Mr. Joseph J male 35.0 0 \n", "21 Beesley, Mr. Lawrence male 34.0 0 \n", "22 McGowan, Miss. Anna \"Annie\" female 15.0 0 \n", "23 Sloper, Mr. William Thompson male 28.0 0 \n", "24 Palsson, Miss. Torborg Danira female 8.0 3 \n", "25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.0 1 \n", "26 Emir, Mr. Farred Chehab male NaN 0 \n", "27 Fortune, Mr. Charles Alexander male 19.0 3 \n", "28 O'Dwyer, Miss. Ellen \"Nellie\" female NaN 0 \n", "29 Todoroff, Mr. Lalio male NaN 0 \n", ".. ... ... ... ... \n", "861 Giles, Mr. Frederick Edward male 21.0 1 \n", "862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 \n", "863 Sage, Miss. Dorothy Edith \"Dolly\" female NaN 8 \n", "864 Gill, Mr. John William male 24.0 0 \n", "865 Bystrom, Mrs. (Karolina) female 42.0 0 \n", "866 Duran y More, Miss. Asuncion female 27.0 1 \n", "867 Roebling, Mr. Washington Augustus II male 31.0 0 \n", "868 van Melkebeke, Mr. Philemon male NaN 0 \n", "869 Johnson, Master. Harold Theodor male 4.0 1 \n", "870 Balkic, Mr. Cerin male 26.0 0 \n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n", "872 Carlsson, Mr. Frans Olof male 33.0 0 \n", "873 Vander Cruyssen, Mr. Victor male 47.0 0 \n", "874 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 \n", "875 Najib, Miss. Adele Kiamie \"Jane\" female 15.0 0 \n", "876 Gustafsson, Mr. Alfred Ossian male 20.0 0 \n", "877 Petroff, Mr. Nedelio male 19.0 0 \n", "878 Laleff, Mr. Kristo male NaN 0 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n", "881 Markun, Mr. Johann male 33.0 0 \n", "882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n", "883 Banfield, Mr. Frederick James male 28.0 0 \n", "884 Sutehall, Mr. Henry Jr male 25.0 0 \n", "885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", "5 0 330877 8.4583 NaN Q \n", "6 0 17463 51.8625 E46 S \n", "7 1 349909 21.0750 NaN S \n", "8 2 347742 11.1333 NaN S \n", "9 0 237736 30.0708 NaN C \n", "10 1 PP 9549 16.7000 G6 S \n", "11 0 113783 26.5500 C103 S \n", "12 0 A/5. 2151 8.0500 NaN S \n", "13 5 347082 31.2750 NaN S \n", "14 0 350406 7.8542 NaN S \n", "15 0 248706 16.0000 NaN S \n", "16 1 382652 29.1250 NaN Q \n", "17 0 244373 13.0000 NaN S \n", "18 0 345763 18.0000 NaN S \n", "19 0 2649 7.2250 NaN C \n", "20 0 239865 26.0000 NaN S \n", "21 0 248698 13.0000 D56 S \n", "22 0 330923 8.0292 NaN Q \n", "23 0 113788 35.5000 A6 S \n", "24 1 349909 21.0750 NaN S \n", "25 5 347077 31.3875 NaN S \n", "26 0 2631 7.2250 NaN C \n", "27 2 19950 263.0000 C23 C25 C27 S \n", "28 0 330959 7.8792 NaN Q \n", "29 0 349216 7.8958 NaN S \n", ".. ... ... ... ... ... \n", "861 0 28134 11.5000 NaN S \n", "862 0 17466 25.9292 D17 S \n", "863 2 CA. 2343 69.5500 NaN S \n", "864 0 233866 13.0000 NaN S \n", "865 0 236852 13.0000 NaN S \n", "866 0 SC/PARIS 2149 13.8583 NaN C \n", "867 0 PC 17590 50.4958 A24 S \n", "868 0 345777 9.5000 NaN S \n", "869 1 347742 11.1333 NaN S \n", "870 0 349248 7.8958 NaN S \n", "871 1 11751 52.5542 D35 S \n", "872 0 695 5.0000 B51 B53 B55 S \n", "873 0 345765 9.0000 NaN S \n", "874 0 P/PP 3381 24.0000 NaN C \n", "875 0 2667 7.2250 NaN C \n", "876 0 7534 9.8458 NaN S \n", "877 0 349212 7.8958 NaN S \n", "878 0 349217 7.8958 NaN S \n", "879 1 11767 83.1583 C50 C \n", "880 1 230433 26.0000 NaN S \n", "881 0 349257 7.8958 NaN S \n", "882 0 7552 10.5167 NaN S \n", "883 0 C.A./SOTON 34068 10.5000 NaN S \n", "884 0 SOTON/OQ 392076 7.0500 NaN S \n", "885 5 382652 29.1250 NaN Q \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(io.StringIO(s.decode('utf-8')))\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* [Pandas API input-output](http://pandas.pydata.org/pandas-docs/stable/api.html#input-output)\n", "* [Pandas API - pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)\n", "* [DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html)\n", "* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n", "* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Licence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", "\n", "© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }