简体   繁体   中英

Group by values across two columns and filter in Pandas

I have a DataFrame like this:

    name    sex births  year
0   Mary    F   7433    2000
1   John    M   6542    2000
2   Emma    F   2342    2000
3   Ron     M   5432    2001
4   Bessie  F   4234    2001
5   Jennie  F   2413    2002
6   Nick    M   2343    2002
7   Ron     M   4342    2002

I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this:

    year   sex  name     births
0   2000   F    Mary     7433
1   2000   M    John     6542
2   2001   F    Bessie   4234
3   2001   M    Ron      5432   
4   2002   F    Jennie   2413
5   2002   M    Ron      4342

It can be done using the following groupby operation:

>>> df.groupby(['year', 'sex'], as_index=False).max()
   year sex    name  births
0  2000   F    Mary    7433
1  2000   M    John    6542
2  2001   F  Bessie    4234
3  2001   M     Ron    5432
4  2002   F  Jennie    2413
5  2002   M     Ron    4342

as_index=False stops the groupby keys from becoming the index in the returned DataFrame.

Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first() :

df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM