How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas

Question

I have a dataframe and I'm trying to group by the Name and Destination columns and calculate the sum of the sales for that Destination for the particular Name and then get the top 2 for each name.

data=
Name    Origin  Destination Sales
John    Italy   China        2
Dan     UK      China        3
Dan     UK      India        2
Sam     UK      India        5
Sam     Italy   Malaysia     1
John    Italy   Malaysia     1
Dan     France  India        4
Dan     Italy   China        2
Sam     Italy   Malaysia     2
John    France  Malaysia     1
Sam     Italy   China        2
Dan     UK      Malaysia     4
Dan     France  India        2
John    France  Malaysia     4
John    Italy   China        4
John    UK      Malaysia     1
Sam     UK      China        4
Sam     France  China        5

I have tried to do this but I keep getting it sorted by the Destination and not the Sales. Below is the code I tried.

data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')

This code gives me this dataframe:

Name    Destination Total_Sales
Dan        China       5
Dan        India       8
John       China       6
John       Malaysia    7
Sam        China       11
Sam        India       5

But it is sorted on the wrong column (Destination) but I would like to sort by the sum of the sales (Total_Sales).

The expected result I want I want to achieve is:

Name    Destination Total_Sales
Dan        India       8
Dan        China       5
John       Malaysia    7
John       China       6
Sam        China       11
Sam        India       5

Answer 1

Your code:

grouped_df = data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')

To sort the result:

sorted_df = grouped_df.sort_values(by=['Name','Total_Sales'], ascending=(True,False))

print(sorted_df)

Output:

   Name Destination  Total_Sales
1   Dan       India            8
0   Dan       China            5
3  John    Malaysia            7
2  John       China            6
4   Sam       China           11
5   Sam       India            5

How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas

Question

1 answers

solution1
0 2019-09-03 14:57:52

How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas

Question

1 answers

solution1 0 2019-09-03 14:57:52

solution1
0 2019-09-03 14:57:52