如何根据 Pandas 中的前一行应用一个简单的数学公式？

Question

I have the following data set:我有以下数据集：

import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07],   ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18],    ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df

Out[1]:
          DATE Stock  Return
0   2020-01-01     A    0.05
1   2020-01-02     A    0.06
2   2020-01-03     A    0.12
3   2020-01-04     A    0.09
4   2020-01-05     A    0.07
5   2020-01-01     B    0.10
6   2020-01-02     B    0.20
7   2020-01-03     B    0.15
8   2020-01-04     B    0.12
9   2020-01-05     B    0.18
10  2020-01-01     C    0.05
11  2020-01-02     C    0.11
12  2020-01-03     C    0.18
13  2020-01-04     C    0.09
14  2020-01-05     C    0.22

For each stock, I want to normalize the stock price of the time-series to 100 at t=-1 and apply the following formula for t=0, 1, 2, ..., n:对于每只股票，我想在 t=-1 时将时间序列的股票价格标准化为 100，并对 t=0、1、2、...、n 应用以下公式：

Pt = Pt-1 * (1+rt) , where Pt = Price in period t and rt = Return in period t, respectively. Pt = Pt-1 * (1+rt) ，其中 Pt = t 期间的价格，rt = t 期间的回报率。

Eventually, I would like to receive the following:最终，我想收到以下内容：

Out[3]:
          DATE Stock  Return   Price
0   2020-01-01     A    0.05  105.00
1   2020-01-02     A    0.06  111.30
2   2020-01-03     A    0.12  124.66
3   2020-01-04     A    0.09  135.88
4   2020-01-05     A    0.07  145.39
5   2020-01-01     B    0.10  110.00
6   2020-01-02     B    0.20  132.00
7   2020-01-03     B    0.15  151.80
8   2020-01-04     B    0.12  170.02
9   2020-01-05     B    0.18  200.62
10  2020-01-01     C    0.05  105.00
11  2020-01-02     C    0.11  116.55
12  2020-01-03     C    0.18  137.53
13  2020-01-04     C    0.09  149.91
14  2020-01-05     C    0.22  182.89

For instance, at t=0 for stock A, the price would be: 100*(1+0.05) = 105. Similarly, for t=1, the price would be: 105*(1+0.06) = 111.30 etc. Seems quite straightforward, I know, but somehow I cannot figure how to properly set it with pandas. Is there a for loop required?例如，在 t=0 时，股票 A 的价格为：100*(1+0.05) = 105。同样，对于 t=1，价格为：105*(1+0.06) = 111.30 等。似乎很简单，我知道，但不知何故我不知道如何用 pandas 正确设置它。是否需要 for 循环？ Thanks for any suggestions!感谢您的任何建议！

Answer 1

Seems like you'll need something iterative.好像你需要一些迭代。 Let's keep it simple with a for loop:让我们用 for 循环保持简单：

pt = [100]

for rt in df['Return'].tolist(): 
    pt.append(pt[-1] * (1 + rt))

df['Price'] = pt[1:]

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  159.924922
6   2020-01-02     B    0.20  191.909906
7   2020-01-03     B    0.15  220.696392
8   2020-01-04     B    0.12  247.179960
9   2020-01-05     B    0.18  291.672352

This is quite fast, but if you need something faster there is always the option of numba or cython.这是相当快的，但如果你需要更快的东西，总是可以选择 numba 或 cython。

To do this per group, we can wrap the loop into a function and use groupby.apply :要按组执行此操作，我们可以将循环包装到 function 中并使用groupby.apply ：

def calculate_price(group):
    pt = [100]
    
    for rt in group['Return'].tolist(): 
        pt.append(pt[-1] * (1 + rt))

    return pd.Series(pt[1:], index=group.index)

df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

Answer 2

Try using pandas cumprod and groupby methods:尝试使用 pandas cumprod和groupby方法：

df['Price'] = (df.assign(Return = df.Return+1)
               .groupby('Stock')['Return']
               .cumprod()
               .mul(100)
               )

result:结果：

          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

如何根据 Pandas 中的前一行应用一个简单的数学公式？

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-10-07 08:58:41

解决方案2
1 2021-10-07 09:34:56

如何根据 Pandas 中的前一行应用一个简单的数学公式？

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-10-07 08:58:41

解决方案2 1 2021-10-07 09:34:56

解决方案1
3 已采纳 2021-10-07 08:58:41

解决方案2
1 2021-10-07 09:34:56