I have a dataframe that looks like the following:
df
Out[327]:
date store property_name property_value
0 2013-06-20 1 price 101
1 2013-06-20 2 price 201
2 2013-06-21 1 price 301
3 2013-06-21 2 price 401
4 2013-06-20 1 quantity 1000
5 2013-06-20 2 quantity 2000
6 2013-06-21 1 quantity 3000
7 2013-06-21 2 quantity 4000
I would like to calculate revenue for each date, for each store then add that to the bottom of the dataframe. For example, for 2014-06-20, for store#2: revenue=201*2000 = 402000.
Below is my code but I know it's not efficient for larger dataframe:
import pandas as pd
dates = df['date'].unique()
stores = df['store'].unique()
df_len = len(df)
for date in dates:
for store in stores:
mask_price = (df['date']==date) & (df['store']==store) & (df['property_name']=='price')
mask_quantity = (df['date']==date) & (df['store']==store) & (df['property_name']=='quantity')
price = df.loc[mask_price,'property_value'].iloc[0]
quantity = df.loc[mask_quantity,'property_value'].iloc[0]
df.loc[df_len,'date'] = date
df.loc[df_len,'store'] = store
df.loc[df_len,'property_name'] = 'revenue'
df.loc[df_len,'property_value'] = price*quantity
df_len=df_len+1
Thank you in advanced for your help :)
This is one way.
price = df[df['property_name'] == 'price'].set_index(['date', 'store'])['property_value']
quantity = df[df['property_name'] == 'quantity'].set_index(['date', 'store'])['property_value']
rev = (price * quantity).reset_index().assign(property_name='revenue')
df = pd.concat([df, rev], ignore_index=True)
Explanation
price
and quantity
dataframes via slicing, index by date
and store
. rev
via price
* quantity
on index; add property_name
columns. axis=0
by default (index). Result
date property_name property_value store
0 2013-06-20 price 101 1
1 2013-06-20 price 201 2
2 2013-06-21 price 301 1
3 2013-06-21 price 401 2
4 2013-06-20 quantity 1000 1
5 2013-06-20 quantity 2000 2
6 2013-06-21 quantity 3000 1
7 2013-06-21 quantity 4000 2
8 2013-06-20 revenue 101000 1
9 2013-06-20 revenue 402000 2
10 2013-06-21 revenue 903000 1
11 2013-06-21 revenue 1604000 2
Another way of doing it:
prices = df[df['property_name'] == 'price']
quantities = df[df['property_name'] == 'quantity']
res = prices.merge(quantities,on=['date','store'],how='left')
res['property_value'] = res['property_value_x']*res['property_value_y']
res['property_name'] = 'revenue'
res = res[['date','store','property_name','property_value']]
res = prices.append([quantities,res])
Same logic as first answer here:
Hope that helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.