简体   繁体   English

如何根据groupby +最大结果过滤数据框?

[英]How can I filter my dataframe based on groupby+nlargest result?

I have a dataframe with all Fifa 19 players. 我的所有FIFA 19球员都有一个数据框。 I've used group by to get the top 10 countries with the best players (best as in the highest overall mean), including only countries with more than 250 players in the Dataframe. 我使用分组依据来获得球员水平最高的前10个国家/地区(最好,总体平均水平最高),其中仅包括Dataframe中拥有250个以上球员的国家/地区。

df[df.groupby('Nationality')['Overall'].transform('size') > 250].groupby(['Nationality'])['Overall'].mean().nlargest(10)

Now, I want to get the entire dataframe, all columns included, but only with these top 10 countries. 现在,我想获取整个数据框,包括所有列,但仅包含前10个国家/地区。 How can I do this? 我怎样才能做到这一点?

UPDATE: 更新:

Example created to better illustrate: 创建示例以更好地说明:

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

I would like to filter only brazil on this dataframe 我只想在此数据框上过滤巴西

You can use the values of country in your "top N" dataframe to subset the original dataframe. 您可以使用“前N个”数据框中的country的值来对原始数据框进行子集化。

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

top = df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

df_top = df.loc[df['country'].isin(top.reset_index()['country'])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM