I have a dataframe and I'm trying to group by the Name and Destination columns and calculate the sum of the sales for that Destination for the particular Name and then get the top 2 for each name.
data=
Name Origin Destination Sales
John Italy China 2
Dan UK China 3
Dan UK India 2
Sam UK India 5
Sam Italy Malaysia 1
John Italy Malaysia 1
Dan France India 4
Dan Italy China 2
Sam Italy Malaysia 2
John France Malaysia 1
Sam Italy China 2
Dan UK Malaysia 4
Dan France India 2
John France Malaysia 4
John Italy China 4
John UK Malaysia 1
Sam UK China 4
Sam France China 5
I have tried to do this but I keep getting it sorted by the Destination and not the Sales. Below is the code I tried.
data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
This code gives me this dataframe:
Name Destination Total_Sales
Dan China 5
Dan India 8
John China 6
John Malaysia 7
Sam China 11
Sam India 5
But it is sorted on the wrong column (Destination) but I would like to sort by the sum of the sales (Total_Sales).
The expected result I want I want to achieve is:
Name Destination Total_Sales
Dan India 8
Dan China 5
John Malaysia 7
John China 6
Sam China 11
Sam India 5
Your code:
grouped_df = data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
To sort the result:
sorted_df = grouped_df.sort_values(by=['Name','Total_Sales'], ascending=(True,False))
print(sorted_df)
Output:
Name Destination Total_Sales
1 Dan India 8
0 Dan China 5
3 John Malaysia 7
2 John China 6
4 Sam China 11
5 Sam India 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.