简体   繁体   English

使用熊猫中的两列计算平均值

[英]Calculate the mean value using two columns in pandas

I have a deal dataframe with three columns and I have sorted by the type and date, It looks like:我有一个包含三列的交易数据框,我按类型和日期排序,它看起来像:

  type    date      price
   A    2020-05-01   4
   A    2020-06-04   6
   A    2020-06-08   8
   A    2020-07-03   5
   B    2020-02-01   3
   B    2020-04-02   4

There are many types (A, B, C,D,E…), I want to calculate the previous mean price of the same type of product .有很多种(A,B,C,D,E...),我想计算同类型产品的前平均价格 For example: the pre_mean_price value of third row A is (4+6)/2=5.例如:第三行 A 的 pre_mean_price 值为 (4+6)/2=5。 I want to get a dataframe like this:我想得到这样的数据框:

   type    date      price  pre_mean_price
   A    2020-05-01   4       .
   A    2020-06-04   6       4
   A    2020-06-08   8       5 
   A    2020-07-03   5       6
   B    2020-02-01   3       .
   B    2020-04-02   4       3

How can I calculate the pre_mean_price?如何计算 pre_mean_price? Thanks a lot!非常感谢!

You can use expanding().mean() after groupby for each group , then shift the values.您可以在 groupby 之后为每个 group 使用expanding().mean() .mean expanding().mean() ,然后移动值。

df['pre_mean_price'] = df.groupby("type")['price'].apply(lambda x: 
                                                         x.expanding().mean().shift())
print(df)

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Something like就像是

df['pre_mean_price'] = df.groupby('type').expanding().mean().groupby('type').shift(1)['price'].values

which produces产生

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Short explanation简短说明

The idea is to这个想法是

  • First groupby "type" with .groupby() .第一个 groupby 使用.groupby() "type" This must be done since we want to calculate the (incremental) means within the group "type".这必须完成,因为我们要计算“类型”组的(增量)均值。
  • Then, calculate the incremental mean with expanding().mean() .然后,使用expanding().mean()计算增量平均值。 The output in this point is这一点的输出是
        price
type
A    0   4.00
     1   5.00
     2   6.00
     3   5.75
B    4   3.00
     5   3.50
  • Then, groupby again by "type" , and shift the elements inside the groups by one row with shift(1) .然后,再次按"type"分组,并使用shift(1)将组内的元素shift(1)一行。
  • Then, just extract the values of the price column (the incremental means)然后,只需提取price列的值(增量方式)
  • Note : This assumes your data is sorted by date.注意:这假设您的数据按日期排序。 It it is not, call df.sort_values('date', inplace=True) before.它不是,之前调用df.sort_values('date', inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM