简体   繁体   中英

How to apply a simple mathematical formula based on the previous row in Pandas?

I have the following data set:

import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07],   ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18],    ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df

Out[1]:
          DATE Stock  Return
0   2020-01-01     A    0.05
1   2020-01-02     A    0.06
2   2020-01-03     A    0.12
3   2020-01-04     A    0.09
4   2020-01-05     A    0.07
5   2020-01-01     B    0.10
6   2020-01-02     B    0.20
7   2020-01-03     B    0.15
8   2020-01-04     B    0.12
9   2020-01-05     B    0.18
10  2020-01-01     C    0.05
11  2020-01-02     C    0.11
12  2020-01-03     C    0.18
13  2020-01-04     C    0.09
14  2020-01-05     C    0.22

For each stock, I want to normalize the stock price of the time-series to 100 at t=-1 and apply the following formula for t=0, 1, 2, ..., n:

Pt = Pt-1 * (1+rt) , where Pt = Price in period t and rt = Return in period t, respectively.

Eventually, I would like to receive the following:

Out[3]:
          DATE Stock  Return   Price
0   2020-01-01     A    0.05  105.00
1   2020-01-02     A    0.06  111.30
2   2020-01-03     A    0.12  124.66
3   2020-01-04     A    0.09  135.88
4   2020-01-05     A    0.07  145.39
5   2020-01-01     B    0.10  110.00
6   2020-01-02     B    0.20  132.00
7   2020-01-03     B    0.15  151.80
8   2020-01-04     B    0.12  170.02
9   2020-01-05     B    0.18  200.62
10  2020-01-01     C    0.05  105.00
11  2020-01-02     C    0.11  116.55
12  2020-01-03     C    0.18  137.53
13  2020-01-04     C    0.09  149.91
14  2020-01-05     C    0.22  182.89

For instance, at t=0 for stock A, the price would be: 100*(1+0.05) = 105. Similarly, for t=1, the price would be: 105*(1+0.06) = 111.30 etc. Seems quite straightforward, I know, but somehow I cannot figure how to properly set it with pandas. Is there a for loop required? Thanks for any suggestions!

Seems like you'll need something iterative. Let's keep it simple with a for loop:

pt = [100]

for rt in df['Return'].tolist(): 
    pt.append(pt[-1] * (1 + rt))

df['Price'] = pt[1:]

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  159.924922
6   2020-01-02     B    0.20  191.909906
7   2020-01-03     B    0.15  220.696392
8   2020-01-04     B    0.12  247.179960
9   2020-01-05     B    0.18  291.672352

This is quite fast, but if you need something faster there is always the option of numba or cython.


To do this per group, we can wrap the loop into a function and use groupby.apply :

def calculate_price(group):
    pt = [100]
    
    for rt in group['Return'].tolist(): 
        pt.append(pt[-1] * (1 + rt))

    return pd.Series(pt[1:], index=group.index)
df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

Try using pandas cumprod and groupby methods:

df['Price'] = (df.assign(Return = df.Return+1)
               .groupby('Stock')['Return']
               .cumprod()
               .mul(100)
               )

result:

          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM