简体   繁体   English

如何使用python按两列进行分组,求和并使用其中一列进行排序并获得熊猫中每组的n个最高值

[英]How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas

I have a dataframe and I'm trying to group by the Name and Destination columns and calculate the sum of the sales for that Destination for the particular Name and then get the top 2 for each name. 我有一个数据框,我试图按“名称”和“目的地”列进行分组,并为特定名称计算该目的地的销售额之和,然后为每个名称获得前2名。

data=
Name    Origin  Destination Sales
John    Italy   China        2
Dan     UK      China        3
Dan     UK      India        2
Sam     UK      India        5
Sam     Italy   Malaysia     1
John    Italy   Malaysia     1
Dan     France  India        4
Dan     Italy   China        2
Sam     Italy   Malaysia     2
John    France  Malaysia     1
Sam     Italy   China        2
Dan     UK      Malaysia     4
Dan     France  India        2
John    France  Malaysia     4
John    Italy   China        4
John    UK      Malaysia     1
Sam     UK      China        4
Sam     France  China        5

I have tried to do this but I keep getting it sorted by the Destination and not the Sales. 我曾尝试这样做,但我一直按目的地而不是按销售额进行排序。 Below is the code I tried. 下面是我尝试的代码。

data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')

This code gives me this dataframe: 这段代码给了我这个数据框:

Name    Destination Total_Sales
Dan        China       5
Dan        India       8
John       China       6
John       Malaysia    7
Sam        China       11
Sam        India       5

But it is sorted on the wrong column (Destination) but I would like to sort by the sum of the sales (Total_Sales). 但这是在错误的列(目标)上排序的,但是我想按销售额的总和(Total_Sales)进行排序。

The expected result I want I want to achieve is: 我想要达到的预期结果是:

Name    Destination Total_Sales
Dan        India       8
Dan        China       5
John       Malaysia    7
John       China       6
Sam        China       11
Sam        India       5

Your code: 您的代码:

grouped_df = data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')

To sort the result: 要对结果进行排序:

sorted_df = grouped_df.sort_values(by=['Name','Total_Sales'], ascending=(True,False))

print(sorted_df)

Output: 输出:

   Name Destination  Total_Sales
1   Dan       India            8
0   Dan       China            5
3  John    Malaysia            7
2  John       China            6
4   Sam       China           11
5   Sam       India            5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM