简体   繁体   English

在pandas数据框中的groupby之后选择每个组中的前3个类别

[英]Select top 3 categories in each group after groupby in pandas dataframe

so my dataframe looks like this now:所以我的数据框现在看起来像这样:

| Name | Type | Class   | Amount |
|------|------|---------|--------|
| Abel | A    | Chinese | 2      |
| Abel | B    | English | 5      |
| Abel | C    | Science | -1     |
| Abel | D    | Physics | -10    |
| Cain | C    | Chinese | -5     |
| Cain | B    | Science | 0      |
| Cain | A    | English | 30     |
| Cain | D    | Chinese | 2      |
|------|------|---------|--------|

data sample:数据样本:

data = {'Name': ['Abel', 'Abel', 'Abel', 'Abel', 'Cain', 'Cain', 'Cain', 'Cain'],
'Type': ['A', 'B', 'C', 'D', 'C', 'B', 'A', 'D'],
'Class': ['Chinese', 'English', 'Science', 'Physics', 'Chinese', 'Science', 'English', 'Chinese'],
'Amount': [2,5,-1,-10,-5,0,30,2]}

I'm trying to find for each name what are the top n type and top n class based on the amount.我正在尝试根据数量查找每个名称的前 n 类型和前 n 类是什么。

I tried df.groupby(["Name","Type"]).sum() which gives me the groupings but how can I select the top 5 so that I can transpose them into 5 different columns?我试过 df.groupby(["Name","Type"]).sum() 它给了我分组,但我如何选择前 5 个以便我可以将它们转换为 5 个不同的列?

Eg The final output for top 3 types should be something like this, top 3 classes is just something similar except the columns are class 1 to class 3:例如,前 3 种类型的最终输出应该是这样的,前 3 类只是类似的东西,除了列是第 1 类到第 3 类:

| Name | Type 1 | Type 2 | Type 3 |
|------|--------|--------|--------|
| Abel | B      |   A    |   C    |
| Cain | A      |   D    |   B    |

I tried sort_values and then .head(5) too but somehow the sort treats negative numbers as the biggest.我尝试了 sort_values 然后也尝试了 .head(5) 但不知何故排序将负数视为最大的。 Besides, the results also fall out of grouping.此外,结果也脱离了分组。 Any help please?请问有什么帮助吗? Thanks!谢谢!

Use:用:

#sorting by both columns
df1 = df.sort_values(['Name','Amount'], ascending=[True, False])
#create counter column used for later columns names
df1['g'] = df1.groupby('Name').cumcount().add(1)
#filter top3
df1 = df1[df1['g'] <= 3]
#reshape by pivot
df2 = df1.pivot('Name','g','Type').add_prefix('Type ').reset_index().rename_axis(None, axis=1)
print (df2)
   Name Type 1 Type 2 Type 3
0  Abel      B      A      C
1  Cain      A      D      B

I'm not 100% sure if I understand your question correctly, but you can use我不是 100% 确定我是否正确理解您的问题,但您可以使用

df.nlargest(5,["Amount"])

This will select the 5 largest amounts.这将选择 5 个最大的金额。 You can adjust the 5. 5. 可以调整。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM