[英]Group by values across two columns and filter in Pandas
I have a DataFrame like this: 我有一个像这样的DataFrame:
name sex births year
0 Mary F 7433 2000
1 John M 6542 2000
2 Emma F 2342 2000
3 Ron M 5432 2001
4 Bessie F 4234 2001
5 Jennie F 2413 2002
6 Nick M 2343 2002
7 Ron M 4342 2002
I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this: 我需要获取新的DataFrame,其中的数据将按年份和性别分组,最后两列将是具有最大出生数和最大(出生)值的名称,如下所示:
year sex name births
0 2000 F Mary 7433
1 2000 M John 6542
2 2001 F Bessie 4234
3 2001 M Ron 5432
4 2002 F Jennie 2413
5 2002 M Ron 4342
It can be done using the following groupby
operation: 可以使用以下
groupby
操作groupby
操作:
>>> df.groupby(['year', 'sex'], as_index=False).max()
year sex name births
0 2000 F Mary 7433
1 2000 M John 6542
2 2001 F Bessie 4234
3 2001 M Ron 5432
4 2002 F Jennie 2413
5 2002 M Ron 4342
as_index=False
stops the groupby keys from becoming the index in the returned DataFrame. as_index=False
阻止groupby键成为返回的DataFrame中的索引。
Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first()
: 另外,要获得所需的输出,您可能需要对“出生”列进行排序,然后使用
groupby.first()
:
df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.