简体   繁体   English

按两列中的值分组并在Pandas中过滤

[英]Group by values across two columns and filter in Pandas

I have a DataFrame like this: 我有一个像这样的DataFrame:

    name    sex births  year
0   Mary    F   7433    2000
1   John    M   6542    2000
2   Emma    F   2342    2000
3   Ron     M   5432    2001
4   Bessie  F   4234    2001
5   Jennie  F   2413    2002
6   Nick    M   2343    2002
7   Ron     M   4342    2002

I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this: 我需要获取新的DataFrame,其中的数据将按年份和性别分组,最后两列将是具有最大出生数和最大(出生)值的名称,如下所示:

    year   sex  name     births
0   2000   F    Mary     7433
1   2000   M    John     6542
2   2001   F    Bessie   4234
3   2001   M    Ron      5432   
4   2002   F    Jennie   2413
5   2002   M    Ron      4342

It can be done using the following groupby operation: 可以使用以下groupby操作groupby操作:

>>> df.groupby(['year', 'sex'], as_index=False).max()
   year sex    name  births
0  2000   F    Mary    7433
1  2000   M    John    6542
2  2001   F  Bessie    4234
3  2001   M     Ron    5432
4  2002   F  Jennie    2413
5  2002   M     Ron    4342

as_index=False stops the groupby keys from becoming the index in the returned DataFrame. as_index=False阻止groupby键成为返回的DataFrame中的索引。

Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first() : 另外,要获得所需的输出,您可能需要对“出生”列进行排序,然后使用groupby.first()

df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM