1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-03-12 17:16:59 +00:00

Update notebook with pivot_table examples

This commit is contained in:
cif 2025-03-06 16:05:16 +01:00
parent 3f7694e330
commit c49c866a2e

View File

@ -451,7 +451,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Pivot tables are an intuitive way to analyze data, and alternative to group columns." "Pivot tables are an intuitive way to analyze data, and an alternative to group columns.\n",
"\n",
"This command makes a table with rows Sex and columns Pclass, and\n",
"averages the result of the column Survived, thereby giving the percentage of survivors in each grouping."
] ]
}, },
{ {
@ -460,7 +463,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pd.pivot_table(df, index='Sex')" "pd.pivot_table(df, index='Sex', columns='Pclass', values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we want to analyze multi-index, the percentage of survivoers, given sex and age, and distributed by Pclass."
] ]
}, },
{ {
@ -469,7 +479,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'])" "pd.pivot_table(df, index=['Sex', 'Age'], columns=['Pclass'], values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nevertheless, this is not very useful since we have a row per age. Thus, we define a partition."
] ]
}, },
{ {
@ -478,7 +495,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'])" "# Partition each of the passengers into 3 categories based on their age\n",
"age = pd.cut(df['Age'], [0,12,18,80])"
] ]
}, },
{ {
@ -487,7 +505,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'], aggfunc=np.mean)" "pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can change the function used for aggregating each group."
] ]
}, },
{ {
@ -496,8 +521,18 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Try np.sum, np.size, len\n", "# default\n",
"pd.pivot_table(df, index=['Sex', 'Pclass'], values=['Age', 'SibSp'], aggfunc=[np.mean, np.sum])" "pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'], aggfunc=np.mean)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Two agg functions\n",
"pd.pivot_table(df, index=['Sex', age], columns=['Pclass'], values=['Survived'], aggfunc=[np.mean, np.sum])"
] ]
}, },
{ {
@ -972,7 +1007,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.8.12" "version": "3.11.5"
}, },
"latex_envs": { "latex_envs": {
"LaTeX_envs_menu_present": true, "LaTeX_envs_menu_present": true,