Can I check what is the difference between
df[['column1', 'column2']].groupby('column1').agg(['mean', 'count'])
and
df[['column1', 'column2']].groupby('column1').agg({'column2': 'mean', 'column2': 'count'})
In the first example, mean
and count
is performed on column2
which is not in groupby
.
In the second example, same logic but I had explicitly mentioned column2
in agg
.
Why do I not see the same result for both?
The problem with the second statement has to due with overwriting the column.
There are at least three ways to do this statement.
First let's build a test dataset:
import pandas as pd
from seaborn import load_dataset
df_tips = load_dataset('tips')
df_tips.head()
df_tips[['sex','size']].groupby(['sex']).agg(['mean','count'])
Output:
size
mean count
sex
Male 2.630573 157
Female 2.459770 87
A dataframe with a multiindex column header size and level=1 both aggregations.
df_tips[['sex','size']].groupby(['sex']).agg({'size':['mean','count']})
Output (same as above)
size
mean count
sex
Male 2.630573 157
Female 2.459770 87
df_tips[['sex','size']].groupby(['sex']).agg(mean_size=('size','mean'),count_size=('size','count'))
Output:
mean_size count_size
sex
Male 2.630573 157
Female 2.459770 87
This give a dataframe with a 'flatten' column header that you name yourself, however that name must not contain a space or special characters.
df_tips[['sex','size']].groupby(['sex']).agg({'size':'mean','size':'count'})
Outputs:
size
sex
Male 157
Female 87
What is happening here is that you are getting two columns one for each aggregations but the column header is the same 'size', therefore the first iteration is getting overwritten with the second 'count' in this case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.