使用熊猫中的两列计算平均值

Question

I have a deal dataframe with three columns and I have sorted by the type and date, It looks like:我有一个包含三列的交易数据框，我按类型和日期排序，它看起来像：

  type    date      price
   A    2020-05-01   4
   A    2020-06-04   6
   A    2020-06-08   8
   A    2020-07-03   5
   B    2020-02-01   3
   B    2020-04-02   4

There are many types (A, B, C,D,E…), I want to calculate the previous mean price of the same type of product .有很多种（A，B，C，D，E...），我想计算同类型产品的前平均价格。 For example: the pre_mean_price value of third row A is (4+6)/2=5.例如：第三行 A 的 pre_mean_price 值为 (4+6)/2=5。 I want to get a dataframe like this:我想得到这样的数据框：

   type    date      price  pre_mean_price
   A    2020-05-01   4       .
   A    2020-06-04   6       4
   A    2020-06-08   8       5 
   A    2020-07-03   5       6
   B    2020-02-01   3       .
   B    2020-04-02   4       3

How can I calculate the pre_mean_price?如何计算 pre_mean_price？ Thanks a lot!非常感谢！

Answer 1

You can use expanding().mean() after groupby for each group , then shift the values.您可以在 groupby 之后为每个 group 使用expanding().mean() .mean expanding().mean() ，然后移动值。

df['pre_mean_price'] = df.groupby("type")['price'].apply(lambda x: 
                                                         x.expanding().mean().shift())
print(df)

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Answer 2

Something like就像是

df['pre_mean_price'] = df.groupby('type').expanding().mean().groupby('type').shift(1)['price'].values

which produces产生

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Short explanation简短说明

The idea is to这个想法是

First groupby "type" with .groupby() .第一个 groupby 使用.groupby() "type" 。 This must be done since we want to calculate the (incremental) means within the group "type".这必须完成，因为我们要计算“类型”组内的（增量）均值。
Then, calculate the incremental mean with expanding().mean() .然后，使用expanding().mean()计算增量平均值。 The output in this point is这一点的输出是

        price
type
A    0   4.00
     1   5.00
     2   6.00
     3   5.75
B    4   3.00
     5   3.50

Then, groupby again by "type" , and shift the elements inside the groups by one row with shift(1) .然后，再次按"type"分组，并使用shift(1)将组内的元素shift(1)一行。
Then, just extract the values of the price column (the incremental means)然后，只需提取price列的值（增量方式）
Note : This assumes your data is sorted by date.注意：这假设您的数据按日期排序。 It it is not, call df.sort_values('date', inplace=True) before.它不是，之前调用df.sort_values('date', inplace=True) 。

使用熊猫中的两列计算平均值

问题描述

2 个解决方案

解决方案1
5 已采纳 2020-11-11 12:03:04

解决方案2
2 2020-11-11 12:13:39

Short explanation简短说明

使用熊猫中的两列计算平均值

问题描述

2 个解决方案

解决方案1 5 已采纳 2020-11-11 12:03:04

解决方案2 2 2020-11-11 12:13:39

Short explanation简短说明

解决方案1
5 已采纳 2020-11-11 12:03:04

解决方案2
2 2020-11-11 12:13:39