简体   繁体   中英

Calculate the mean value using two columns in pandas

I have a deal dataframe with three columns and I have sorted by the type and date, It looks like:

  type    date      price
   A    2020-05-01   4
   A    2020-06-04   6
   A    2020-06-08   8
   A    2020-07-03   5
   B    2020-02-01   3
   B    2020-04-02   4

There are many types (A, B, C,D,E…), I want to calculate the previous mean price of the same type of product . For example: the pre_mean_price value of third row A is (4+6)/2=5. I want to get a dataframe like this:

   type    date      price  pre_mean_price
   A    2020-05-01   4       .
   A    2020-06-04   6       4
   A    2020-06-08   8       5 
   A    2020-07-03   5       6
   B    2020-02-01   3       .
   B    2020-04-02   4       3

How can I calculate the pre_mean_price? Thanks a lot!

You can use expanding().mean() after groupby for each group , then shift the values.

df['pre_mean_price'] = df.groupby("type")['price'].apply(lambda x: 
                                                         x.expanding().mean().shift())
print(df)

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Something like

df['pre_mean_price'] = df.groupby('type').expanding().mean().groupby('type').shift(1)['price'].values

which produces

  type        date  price  pre_mean_price
0    A  2020-05-01      4             NaN
1    A  2020-06-04      6             4.0
2    A  2020-06-08      8             5.0
3    A  2020-07-03      5             6.0
4    B  2020-02-01      3             NaN
5    B  2020-04-02      4             3.0

Short explanation

The idea is to

  • First groupby "type" with .groupby() . This must be done since we want to calculate the (incremental) means within the group "type".
  • Then, calculate the incremental mean with expanding().mean() . The output in this point is
        price
type
A    0   4.00
     1   5.00
     2   6.00
     3   5.75
B    4   3.00
     5   3.50
  • Then, groupby again by "type" , and shift the elements inside the groups by one row with shift(1) .
  • Then, just extract the values of the price column (the incremental means)
  • Note : This assumes your data is sorted by date. It it is not, call df.sort_values('date', inplace=True) before.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM