I am using pandas groupby
function and trying to get the description of the grouped results, but without each group's maximum and minimum row. I can't find the right answer to my question.
data = {'class': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'],
'num': [-10,18,12,15,50, 10,60,51,54,100]}
df = pd.DataFrame(data)
df.groupby('class').describe()
output:
num
count mean std min 25% 50% 75% max
class
a 5.0 17.0 21.494185 -10.0 12.0 15.0 18.0 50.0
b 5.0 55.0 31.984371 10.0 51.0 54.0 60.0 100.0
The result that I want is:
num
count mean std min 25% 50% 75% max
class
a 3.0 15.0 3.000000 12.0 13.5 15.0 16.5 18.0
b 3.0 55.0 4.582576 51.0 52.5 54.0 57.0 60.0
Using transform
and masking:
df['max']=df.groupby('class')['num'].transform('max')
df['min']=df.groupby('class')['num'].transform('min')
mask = df['num'].ne(df['min'])&df['num'].ne(df['max'])
df.loc[mask,:].groupby('class')['num'].describe()
count mean std min 25% 50% 75% max
class
a 3.0 15.0 3.000000 12.0 13.5 15.0 16.5 18.0
b 3.0 55.0 4.582576 51.0 52.5 54.0 57.0 60.0
Or:
df.loc[mask, ['class', 'num']].groupby('class').describe()
num
count mean std min 25% 50% 75% max
class
a 3.0 15.0 3.000000 12.0 13.5 15.0 16.5 18.0
b 3.0 55.0 4.582576 51.0 52.5 54.0 57.0 60.0
Another method using apply()
, idxmax()
and idxmin()
>>df.groupby('class').apply(lambda x: x.drop([x['num'].idxmax(),x['num'].idxmin()])).rename_axis([None,None]).groupby('class').describe()
num
count mean std min 25% 50% 75% max
class
a 3.0 15.0 3.000000 12.0 13.5 15.0 16.5 18.0
b 3.0 55.0 4.582576 51.0 52.5 54.0 57.0 60.0
Explaination: Do a groupby on class
and remove max
and min
values index from each group. then do a groupby on class
and call the describe()
function.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.