I have a pandas dataframe (df) representing monthly expenses by different individuals. The first column in the dataframe refers to the person ID, the second column refers to the expense category, and the third column refers to the amount being spent. See the example table below:
d = {'PersonID': ['A','A','A','A','A','A','A','A','B','B','B','B','B','B'], 'Category': ['Food','Food','Food','Food','Travel','Travel','Travel','Travel','Food','Food','Food','Travel','Travel','Travel'], 'Expenditure':[10,15,5,20,500,100,1000,2000,10,30,10,800,1000,400]}
df = pd.DataFrame(data=d)
For each person, I'd like to get the sum of the THREE largest expenses in the Food category, and the sum of the TWO largest expenses in the Travel category.
For the example table above, I want the following table:
I am trying to use the following code but the problem is that I cannot specify different N-largest expenses in different categories.
df.groupby(['PersonID','Category'])['Expenditure'].nlargest(2).sum(level=0)
On way to do it is to split your dataframe by category first then groupby sum and concatenate results together afterwards:
pd.concat([
df.query('Category == "Food"').groupby(['PersonID','Category'])['Expenditure'].nlargest(3).sum(level=[0,1]),
df.query('Category == "Travel"').groupby(['PersonID','Category'])['Expenditure'].nlargest(2).sum(level=[0,1])
])
Output:
PersonID Category
A Food 45
B Food 50
A Travel 3000
B Travel 1800
Name: Expenditure, dtype: int64
Using dictionary and list comprehension:
d = {'Food':2,
'Travel':3}
pd.concat([df[df['Category'] == c].groupby(['PersonID','Category'])['Expenditure'].nlargest(n).sum(level=[0,1]) for c,n in d.items()])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.