简体   繁体   English

Python Pandas:根据价格和数量计算收入

[英]Python pandas: calculate revenue from price and quantity

I have a dataframe that looks like the following: 我有一个数据框,如下所示:

df
Out[327]: 
        date  store property_name  property_value
0 2013-06-20      1         price             101
1 2013-06-20      2         price             201
2 2013-06-21      1         price             301
3 2013-06-21      2         price             401
4 2013-06-20      1      quantity            1000
5 2013-06-20      2      quantity            2000
6 2013-06-21      1      quantity            3000
7 2013-06-21      2      quantity            4000

I would like to calculate revenue for each date, for each store then add that to the bottom of the dataframe. 我想计算每个商店每个日期的收入,然后将其添加到数据框的底部。 For example, for 2014-06-20, for store#2: revenue=201*2000 = 402000. 例如,对于2014-06-20,对于商店2:收入= 201 * 2000 = 402000。

Below is my code but I know it's not efficient for larger dataframe: 下面是我的代码,但我知道它对于较大的数据帧效率不高:

import pandas  as pd

dates = df['date'].unique()
stores = df['store'].unique()
df_len = len(df)
for date in dates:
    for store in stores:       
        mask_price = (df['date']==date) & (df['store']==store) & (df['property_name']=='price')
        mask_quantity = (df['date']==date) & (df['store']==store) & (df['property_name']=='quantity')
        price = df.loc[mask_price,'property_value'].iloc[0]
        quantity = df.loc[mask_quantity,'property_value'].iloc[0]

        df.loc[df_len,'date'] = date
        df.loc[df_len,'store'] = store
        df.loc[df_len,'property_name'] = 'revenue'
        df.loc[df_len,'property_value'] = price*quantity

        df_len=df_len+1

Thank you in advanced for your help :) 在此先感谢您的帮助:)

This is one way. 这是一种方式。

price = df[df['property_name'] == 'price'].set_index(['date', 'store'])['property_value']
quantity = df[df['property_name'] == 'quantity'].set_index(['date', 'store'])['property_value']

rev = (price * quantity).reset_index().assign(property_name='revenue')

df = pd.concat([df, rev], ignore_index=True)

Explanation 说明

  • Derive price and quantity dataframes via slicing, index by date and store . 通过切片,按date索引和store得出pricequantity数据帧。
  • Calculate rev via price * quantity on index; 通过price *指数quantity计算rev add property_name columns. 添加property_name列。
  • Concatenate along axis=0 by default (index). 默认情况下,沿axis=0进行连接(索引)。

Result 结果

          date property_name  property_value  store
0   2013-06-20         price             101      1
1   2013-06-20         price             201      2
2   2013-06-21         price             301      1
3   2013-06-21         price             401      2
4   2013-06-20      quantity            1000      1
5   2013-06-20      quantity            2000      2
6   2013-06-21      quantity            3000      1
7   2013-06-21      quantity            4000      2
8   2013-06-20       revenue          101000      1
9   2013-06-20       revenue          402000      2
10  2013-06-21       revenue          903000      1
11  2013-06-21       revenue         1604000      2

Another way of doing it: 另一种方法是:

prices = df[df['property_name'] == 'price']
quantities = df[df['property_name'] == 'quantity']

res = prices.merge(quantities,on=['date','store'],how='left')
res['property_value'] = res['property_value_x']*res['property_value_y']
res['property_name'] = 'revenue'
res = res[['date','store','property_name','property_value']]

res = prices.append([quantities,res])

Same logic as first answer here: 与第一个答案的逻辑相同:

  1. Separate prices and quantities 价格和数量分开
  2. Merge both tables using date and store as a key 使用日期合并两个表并将其存储为键
  3. Compute wanted column in a third table 在第三张表中计算通缉列
  4. Concatenate everything 连接所有内容

Hope that helps. 希望能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM