1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-09-18 12:52:20 +00:00

Compare commits

3 Commits

Author SHA1 Message Date
Dani Vera
19ea5dff09 Update 4_1_Lexical_Processing.ipynb 2019-11-26 15:14:40 +01:00
Carlos A. Iglesias
e70689072f Merge pull request #4 from gsi-upm/dveni-patch-1
Update 3_3_Data_Munging_with_Pandas.ipynb
2019-09-19 10:46:19 +02:00
Dani Vera
344e054ba4 Update 3_3_Data_Munging_with_Pandas.ipynb
Se utiliza np.size en la última columna. Esto calcula el tamaño de la serie, creo que de valores no null, pero no lo que pienso que se pretende es calcular el número de supervivientes, para lo que se podría utilizar np.sum.
2019-09-18 15:39:16 +02:00
2 changed files with 2 additions and 2 deletions

View File

@@ -437,7 +437,7 @@
"\n",
"#Show mean Age, mean SibSp, and number of passengers older than 25 that survived, grouped by Passenger Class and Sex\n",
"df[(df.Age > 25 & (df.Survived == 1))].groupby(['Pclass', 'Sex'])['Age','SibSp','Survived'].agg({'Age': np.mean, \n",
" 'SibSp': np.mean, 'Survived': np.size})"
" 'SibSp': np.mean, 'Survived': np.sum})"
]
},
{

View File

@@ -326,7 +326,7 @@
"def preprocess(words, type='doc'):\n",
" if (type == 'tweet'):\n",
" tknzr = TweetTokenizer(strip_handles=True, reduce_len=True)\n",
" tokens = tknzr.tokenize(tweet)\n",
" tokens = tknzr.tokenize(words)\n",
" else:\n",
" tokens = nltk.word_tokenize(words.lower())\n",
" porter = nltk.PorterStemmer()\n",