I have a Pandas dataframe that looks like this:
date item amount
201901 Apple 1.03
201901 Potato 1.04
201901 Orange 1.00
I'm trying to find the sales of fruits and vegetables by month:
date item amount
201901 Fruit 2.03
201901 Vegetables 1.04
What's the best way to do this? I'm familiar with df.groupby(['date','item'])['amount'].sum()
, but this does not conditionally combine the fruits and veggies.
One way is to create another column type
based on the value in item
and then group on that; is there a better way?
As Manakin said, you need to manually classify your items.
Build a mapping dictionary with item
: category
pairs and pass it to series.map
or series.replace
.
map
will change all the items that are in the dictionary, and fill with NaN
otherwise. replace
will find and replace all matching items and replace them, but will leave items not in the dictionary keys as they are (eg if the dataframe contains 'brussel sprouts'
but that key is not in the dictionary the it will leave it as the item name). It is up to you to decide which behavior you need.
Here's an example with series.map
:
categories = {'Apple': 'Fruit', 'Potato': 'Vegetable', 'Orange': 'Fruit'}
df['category'] = df['item'].map(categories)
summary = df.groupby(['date', 'category'])['amount'].sum().reset_index()
print(summary)
Output
date category amount
0 201901 Fruit 2.03
1 201901 Vegetable 1.04
You should probably have 2 lists or a dictionary of what you consider a fruit or a vegetable but when you do...
mapping = {'Apple': 'Fruit', 'Potato': 'Vegetable', 'Orange': 'Fruit'}
This could give you what you want without a need to add a column , computing the grouping in the fly:
def grouper(row):
return row['Item']
group_earnings = (df.groupby(grouper))['amount'].sum().reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.