Group by multi-level category and return sum of n-largest in each category (n is different for each category)

Question

I have a pandas dataframe (df) representing monthly expenses by different individuals. The first column in the dataframe refers to the person ID, the second column refers to the expense category, and the third column refers to the amount being spent. See the example table below:

d = {'PersonID': ['A','A','A','A','A','A','A','A','B','B','B','B','B','B'], 'Category': ['Food','Food','Food','Food','Travel','Travel','Travel','Travel','Food','Food','Food','Travel','Travel','Travel'], 'Expenditure':[10,15,5,20,500,100,1000,2000,10,30,10,800,1000,400]}
df = pd.DataFrame(data=d)

For each person, I'd like to get the sum of the THREE largest expenses in the Food category, and the sum of the TWO largest expenses in the Travel category.

For the example table above, I want the following table:

I am trying to use the following code but the problem is that I cannot specify different N-largest expenses in different categories.

df.groupby(['PersonID','Category'])['Expenditure'].nlargest(2).sum(level=0)

Answer 1

On way to do it is to split your dataframe by category first then groupby sum and concatenate results together afterwards:

pd.concat([
df.query('Category == "Food"').groupby(['PersonID','Category'])['Expenditure'].nlargest(3).sum(level=[0,1]),
df.query('Category == "Travel"').groupby(['PersonID','Category'])['Expenditure'].nlargest(2).sum(level=[0,1])
])

Output:

PersonID  Category
A         Food          45
B         Food          50
A         Travel      3000
B         Travel      1800
Name: Expenditure, dtype: int64

Using dictionary and list comprehension:

d = {'Food':2,
     'Travel':3}

pd.concat([df[df['Category'] == c].groupby(['PersonID','Category'])['Expenditure'].nlargest(n).sum(level=[0,1]) for c,n in d.items()])

Group by multi-level category and return sum of n-largest in each category (n is different for each category)

Question

1 answers

solution1
2 ACCPTED 2019-07-10 19:50:34

Group by multi-level category and return sum of n-largest in each category (n is different for each category)

Question

1 answers

solution1 2 ACCPTED 2019-07-10 19:50:34

solution1
2 ACCPTED 2019-07-10 19:50:34