如何根据groupby +最大结果过滤数据框？

Question

I have a dataframe with all Fifa 19 players. 我的所有FIFA 19球员都有一个数据框。 I've used group by to get the top 10 countries with the best players (best as in the highest overall mean), including only countries with more than 250 players in the Dataframe. 我使用分组依据来获得球员水平最高的前10个国家/地区（最好，总体平均水平最高），其中仅包括Dataframe中拥有250个以上球员的国家/地区。

df[df.groupby('Nationality')['Overall'].transform('size') > 250].groupby(['Nationality'])['Overall'].mean().nlargest(10)

Now, I want to get the entire dataframe, all columns included, but only with these top 10 countries. 现在，我想获取整个数据框，包括所有列，但仅包含前10个国家/地区。 How can I do this? 我怎样才能做到这一点？

UPDATE: 更新：

Example created to better illustrate: 创建示例以更好地说明：

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

I would like to filter only brazil on this dataframe 我只想在此数据框上过滤巴西

Answer 1

You can use the values of country in your "top N" dataframe to subset the original dataframe. 您可以使用“前N个”数据框中的country的值来对原始数据框进行子集化。

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

top = df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

df_top = df.loc[df['country'].isin(top.reset_index()['country'])]

如何根据groupby +最大结果过滤数据框？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-19 17:02:46

如何根据groupby +最大结果过滤数据框？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-19 17:02:46

解决方案1
1 已采纳 2019-06-19 17:02:46