[英]How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas
I have a dataframe and I'm trying to group by the Name and Destination columns and calculate the sum of the sales for that Destination for the particular Name and then get the top 2 for each name. 我有一个数据框,我试图按“名称”和“目的地”列进行分组,并为特定名称计算该目的地的销售额之和,然后为每个名称获得前2名。
data=
Name Origin Destination Sales
John Italy China 2
Dan UK China 3
Dan UK India 2
Sam UK India 5
Sam Italy Malaysia 1
John Italy Malaysia 1
Dan France India 4
Dan Italy China 2
Sam Italy Malaysia 2
John France Malaysia 1
Sam Italy China 2
Dan UK Malaysia 4
Dan France India 2
John France Malaysia 4
John Italy China 4
John UK Malaysia 1
Sam UK China 4
Sam France China 5
I have tried to do this but I keep getting it sorted by the Destination and not the Sales. 我曾尝试这样做,但我一直按目的地而不是按销售额进行排序。 Below is the code I tried.
下面是我尝试的代码。
data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
This code gives me this dataframe: 这段代码给了我这个数据框:
Name Destination Total_Sales
Dan China 5
Dan India 8
John China 6
John Malaysia 7
Sam China 11
Sam India 5
But it is sorted on the wrong column (Destination) but I would like to sort by the sum of the sales (Total_Sales). 但这是在错误的列(目标)上排序的,但是我想按销售额的总和(Total_Sales)进行排序。
The expected result I want I want to achieve is: 我想要达到的预期结果是:
Name Destination Total_Sales
Dan India 8
Dan China 5
John Malaysia 7
John China 6
Sam China 11
Sam India 5
Your code: 您的代码:
grouped_df = data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
To sort the result: 要对结果进行排序:
sorted_df = grouped_df.sort_values(by=['Name','Total_Sales'], ascending=(True,False))
print(sorted_df)
Output: 输出:
Name Destination Total_Sales
1 Dan India 8
0 Dan China 5
3 John Malaysia 7
2 John China 6
4 Sam China 11
5 Sam India 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.