![](/img/trans.png)
[英]Using Pandas, how to I apply a formula to several rows, based on the average of previous rows given a certain condition?
[英]How to apply a simple mathematical formula based on the previous row in Pandas?
我有以下数据集:
import pandas as pd
data = [['2020-01-01', 'A', 0.05], ['2020-01-02', 'A', 0.06], ['2020-01-03', 'A', 0.12], ['2020-01-04', 'A', 0.09], ['2020-01-05', 'A', 0.07], ['2020-01-01', 'B', 0.10], ['2020-01-02', 'B', 0.20], ['2020-01-03', 'B', 0.15], ['2020-01-04', 'B', 0.12], ['2020-01-05', 'B', 0.18], ['2020-01-01', 'C', 0.05], ['2020-01-02', 'C', 0.11], ['2020-01-03', 'C', 0.18], ['2020-01-04', 'C', 0.09], ['2020-01-05', 'C', 0.22]]
df = pd.DataFrame(data, columns = ['DATE', 'Stock', 'Return'])
df
Out[1]:
DATE Stock Return
0 2020-01-01 A 0.05
1 2020-01-02 A 0.06
2 2020-01-03 A 0.12
3 2020-01-04 A 0.09
4 2020-01-05 A 0.07
5 2020-01-01 B 0.10
6 2020-01-02 B 0.20
7 2020-01-03 B 0.15
8 2020-01-04 B 0.12
9 2020-01-05 B 0.18
10 2020-01-01 C 0.05
11 2020-01-02 C 0.11
12 2020-01-03 C 0.18
13 2020-01-04 C 0.09
14 2020-01-05 C 0.22
对于每只股票,我想在 t=-1 时将时间序列的股票价格标准化为 100,并对 t=0、1、2、...、n 应用以下公式:
Pt = Pt-1 * (1+rt) ,其中 Pt = t 期间的价格,rt = t 期间的回报率。
最终,我想收到以下内容:
Out[3]:
DATE Stock Return Price
0 2020-01-01 A 0.05 105.00
1 2020-01-02 A 0.06 111.30
2 2020-01-03 A 0.12 124.66
3 2020-01-04 A 0.09 135.88
4 2020-01-05 A 0.07 145.39
5 2020-01-01 B 0.10 110.00
6 2020-01-02 B 0.20 132.00
7 2020-01-03 B 0.15 151.80
8 2020-01-04 B 0.12 170.02
9 2020-01-05 B 0.18 200.62
10 2020-01-01 C 0.05 105.00
11 2020-01-02 C 0.11 116.55
12 2020-01-03 C 0.18 137.53
13 2020-01-04 C 0.09 149.91
14 2020-01-05 C 0.22 182.89
例如,在 t=0 时,股票 A 的价格为:100*(1+0.05) = 105。同样,对于 t=1,价格为:105*(1+0.06) = 111.30 等。似乎很简单,我知道,但不知何故我不知道如何用 pandas 正确设置它。是否需要 for 循环? 感谢您的任何建议!
好像你需要一些迭代。 让我们用 for 循环保持简单:
pt = [100]
for rt in df['Return'].tolist():
pt.append(pt[-1] * (1 + rt))
df['Price'] = pt[1:]
df
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 159.924922
6 2020-01-02 B 0.20 191.909906
7 2020-01-03 B 0.15 220.696392
8 2020-01-04 B 0.12 247.179960
9 2020-01-05 B 0.18 291.672352
这是相当快的,但如果你需要更快的东西,总是可以选择 numba 或 cython。
要按组执行此操作,我们可以将循环包装到 function 中并使用groupby.apply
:
def calculate_price(group):
pt = [100]
for rt in group['Return'].tolist():
pt.append(pt[-1] * (1 + rt))
return pd.Series(pt[1:], index=group.index)
df['Price'] = df.groupby('Stock', group_keys=False).apply(calculate_price)
df
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 110.000000
6 2020-01-02 B 0.20 132.000000
7 2020-01-03 B 0.15 151.800000
8 2020-01-04 B 0.12 170.016000
9 2020-01-05 B 0.18 200.618880
10 2020-01-01 C 0.05 105.000000
11 2020-01-02 C 0.11 116.550000
12 2020-01-03 C 0.18 137.529000
13 2020-01-04 C 0.09 149.906610
14 2020-01-05 C 0.22 182.886064
尝试使用 pandas cumprod
和groupby
方法:
df['Price'] = (df.assign(Return = df.Return+1)
.groupby('Stock')['Return']
.cumprod()
.mul(100)
)
结果:
DATE Stock Return Price
0 2020-01-01 A 0.05 105.000000
1 2020-01-02 A 0.06 111.300000
2 2020-01-03 A 0.12 124.656000
3 2020-01-04 A 0.09 135.875040
4 2020-01-05 A 0.07 145.386293
5 2020-01-01 B 0.10 110.000000
6 2020-01-02 B 0.20 132.000000
7 2020-01-03 B 0.15 151.800000
8 2020-01-04 B 0.12 170.016000
9 2020-01-05 B 0.18 200.618880
10 2020-01-01 C 0.05 105.000000
11 2020-01-02 C 0.11 116.550000
12 2020-01-03 C 0.18 137.529000
13 2020-01-04 C 0.09 149.906610
14 2020-01-05 C 0.22 182.886064
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.