简体   繁体   English

如何根据 Pandas 中的前一行应用一个简单的数学公式?

[英]How to apply a simple mathematical formula based on the previous row in Pandas?

I have the following data set:我有以下数据集:

import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07],   ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18],    ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df

Out[1]:
          DATE Stock  Return
0   2020-01-01     A    0.05
1   2020-01-02     A    0.06
2   2020-01-03     A    0.12
3   2020-01-04     A    0.09
4   2020-01-05     A    0.07
5   2020-01-01     B    0.10
6   2020-01-02     B    0.20
7   2020-01-03     B    0.15
8   2020-01-04     B    0.12
9   2020-01-05     B    0.18
10  2020-01-01     C    0.05
11  2020-01-02     C    0.11
12  2020-01-03     C    0.18
13  2020-01-04     C    0.09
14  2020-01-05     C    0.22

For each stock, I want to normalize the stock price of the time-series to 100 at t=-1 and apply the following formula for t=0, 1, 2, ..., n:对于每只股票,我想在 t=-1 时将时间序列的股票价格标准化为 100,并对 t=0、1、2、...、n 应用以下公式:

Pt = Pt-1 * (1+rt) , where Pt = Price in period t and rt = Return in period t, respectively. Pt = Pt-1 * (1+rt) ,其中 Pt = t 期间的价格,rt = t 期间的回报率。

Eventually, I would like to receive the following:最终,我想收到以下内容:

Out[3]:
          DATE Stock  Return   Price
0   2020-01-01     A    0.05  105.00
1   2020-01-02     A    0.06  111.30
2   2020-01-03     A    0.12  124.66
3   2020-01-04     A    0.09  135.88
4   2020-01-05     A    0.07  145.39
5   2020-01-01     B    0.10  110.00
6   2020-01-02     B    0.20  132.00
7   2020-01-03     B    0.15  151.80
8   2020-01-04     B    0.12  170.02
9   2020-01-05     B    0.18  200.62
10  2020-01-01     C    0.05  105.00
11  2020-01-02     C    0.11  116.55
12  2020-01-03     C    0.18  137.53
13  2020-01-04     C    0.09  149.91
14  2020-01-05     C    0.22  182.89

For instance, at t=0 for stock A, the price would be: 100*(1+0.05) = 105. Similarly, for t=1, the price would be: 105*(1+0.06) = 111.30 etc. Seems quite straightforward, I know, but somehow I cannot figure how to properly set it with pandas. Is there a for loop required?例如,在 t=0 时,股票 A 的价格为:100*(1+0.05) = 105。同样,对于 t=1,价格为:105*(1+0.06) = 111.30 等。似乎很简单,我知道,但不知何故我不知道如何用 pandas 正确设置它。是否需要 for 循环? Thanks for any suggestions!感谢您的任何建议!

Seems like you'll need something iterative.好像你需要一些迭代。 Let's keep it simple with a for loop:让我们用 for 循环保持简单:

pt = [100]

for rt in df['Return'].tolist(): 
    pt.append(pt[-1] * (1 + rt))

df['Price'] = pt[1:]

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  159.924922
6   2020-01-02     B    0.20  191.909906
7   2020-01-03     B    0.15  220.696392
8   2020-01-04     B    0.12  247.179960
9   2020-01-05     B    0.18  291.672352

This is quite fast, but if you need something faster there is always the option of numba or cython.这是相当快的,但如果你需要更快的东西,总是可以选择 numba 或 cython。


To do this per group, we can wrap the loop into a function and use groupby.apply :要按组执行此操作,我们可以将循环包装到 function 中并使用groupby.apply

def calculate_price(group):
    pt = [100]
    
    for rt in group['Return'].tolist(): 
        pt.append(pt[-1] * (1 + rt))

    return pd.Series(pt[1:], index=group.index)
df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)

df
          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

Try using pandas cumprod and groupby methods:尝试使用 pandas cumprodgroupby方法:

df['Price'] = (df.assign(Return = df.Return+1)
               .groupby('Stock')['Return']
               .cumprod()
               .mul(100)
               )

result:结果:

          DATE Stock  Return       Price
0   2020-01-01     A    0.05  105.000000
1   2020-01-02     A    0.06  111.300000
2   2020-01-03     A    0.12  124.656000
3   2020-01-04     A    0.09  135.875040
4   2020-01-05     A    0.07  145.386293
5   2020-01-01     B    0.10  110.000000
6   2020-01-02     B    0.20  132.000000
7   2020-01-03     B    0.15  151.800000
8   2020-01-04     B    0.12  170.016000
9   2020-01-05     B    0.18  200.618880
10  2020-01-01     C    0.05  105.000000
11  2020-01-02     C    0.11  116.550000
12  2020-01-03     C    0.18  137.529000
13  2020-01-04     C    0.09  149.906610
14  2020-01-05     C    0.22  182.886064

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Pandas,如何根据给定条件下前几行的平均值将公式应用于多行? - Using Pandas, how to I apply a formula to several rows, based on the average of previous rows given a certain condition? 如何在熊猫中应用上一行结果 - How to apply previous row result in pandas Pandas,根据前一行对每一行应用函数 - Pandas, apply function for each row based on previous rows 使用涉及先前值的数学公式替换 Pandas 数据框中的 0 值 - Replace values of 0 in Pandas dataframe using mathematical formula involving the previous value 如何应用 python pandas 公式根据血红蛋白样本对个体进行分类 - How to apply python pandas formula to categorise individuals based on their haemoglobin samples 如何在pandas group by中应用数组函数(前一行计算) - How to apply array function (previous row calculation) with pandas group by 如何根据先前的行和列条件填充 pandas dataframe 的行? - How to populate row of pandas dataframe based on previous row and column condition? 在apply方法中使用熊猫中先前计算的行 - Use the previous calculated row in pandas in the apply method 引用 Pandas 中的上一行应用 function 逻辑 - Referencing previous row in Pandas apply function logic 删除基于上一行的 pandas 行 - Remove pandas row that is based on previous row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM