简体   繁体   中英

Groupby, count and calculate medians in Pandas

I have this dataframe:

df:
      type . size .  margin .  height
0 .      A .    2 .       5 .       1
1 .      A .    3 .       4 .       1
2 .      B .    1 .       1 .       3 

I want to groupby type, count the number of companies in each type and calculate the medians for all columns.

I know that for count is like this

df=df.groupby('type').count('type')

But is there a way to use a one liner and put everything in the same df?

Something like:

df=df.groupby('type').calculate_medians_and_counts

It should come out looking like this:

type    count   size   margin   height
   A        2    2.5      4.5        1
   B        1      1        1        3

(size, margin and height are the medians from df)

Use agg by dictionary:

d = {'size':'median', 'margin':'median', 'height':'median', 'type':'size'}

Or if many columns is possible create dict dynamically:

d = dict.fromkeys(df.columns.difference(['type']), 'median')
d['type'] = 'size'

df = df.groupby('type').agg(d).rename(columns={'type':'count'}).reset_index()

Another alternative with join :

df = df.groupby('type').median().join(df.type.value_counts().rename('count')).reset_index()

print (df)
  type  margin  size  height  count
0    A     4.5   2.5       1      2
1    B     1.0   1.0       3      1

I will using median base on index level=0+ value_counts

pd.concat([df.set_index('type').median(level=0),df.type.value_counts()],1)
Out[787]: 
      size  margin  height  type
type                            
A      2.5     4.5     1.0     2
B      1.0     1.0     3.0     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM