按两列中的值分组并在Pandas中过滤

Question

I have a DataFrame like this: 我有一个像这样的DataFrame：

    name    sex births  year
0   Mary    F   7433    2000
1   John    M   6542    2000
2   Emma    F   2342    2000
3   Ron     M   5432    2001
4   Bessie  F   4234    2001
5   Jennie  F   2413    2002
6   Nick    M   2343    2002
7   Ron     M   4342    2002

I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this: 我需要获取新的DataFrame，其中的数据将按年份和性别分组，最后两列将是具有最大出生数和最大（出生）值的名称，如下所示：

    year   sex  name     births
0   2000   F    Mary     7433
1   2000   M    John     6542
2   2001   F    Bessie   4234
3   2001   M    Ron      5432   
4   2002   F    Jennie   2413
5   2002   M    Ron      4342

Answer 1

It can be done using the following groupby operation: 可以使用以下groupby操作groupby操作：

>>> df.groupby(['year', 'sex'], as_index=False).max()
   year sex    name  births
0  2000   F    Mary    7433
1  2000   M    John    6542
2  2001   F  Bessie    4234
3  2001   M     Ron    5432
4  2002   F  Jennie    2413
5  2002   M     Ron    4342

as_index=False stops the groupby keys from becoming the index in the returned DataFrame. as_index=False阻止groupby键成为返回的DataFrame中的索引。

Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first() : 另外，要获得所需的输出，您可能需要对“出生”列进行排序，然后使用groupby.first() ：

df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()

按两列中的值分组并在Pandas中过滤

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-11-18 15:42:54

按两列中的值分组并在Pandas中过滤

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-11-18 15:42:54

解决方案1
4 已采纳 2015-11-18 15:42:54