[英]How to calculate with previous values in a Pandas MultiIndex DataFrame?
I have the following MultiIndex dataframe.我有以下 MultiIndex 数据框。
Close ATR
Date Symbol
1990-01-01 A 24 2
1990-01-01 B 72 7
1990-01-01 C 40 3.4
1990-01-02 A 21 1.5
1990-01-02 B 65 6
1990-01-02 C 45 4.2
1990-01-03 A 19 2.5
1990-01-03 B 70 6.3
1990-01-03 C 51 5
I want to calculate three columns:我想计算三列:
Shares
= previous day's Equity
* 0.02 / ATR
, rounded down to whole number Shares
= 前一天的Equity
* 0.02 / ATR
,四舍五入为整数
Profit
= Shares
* Close
Profit
= Shares
* Close
Equity
= previous day's Equity
+ sum of Profit
for each Symbol
Equity
= 前一天的Equity
+ 每个Symbol
的Profit
总和
Equity
has an initial value of 10,000. Equity
的初始值为 10,000。
The expected output is:预期的输出是:
Close ATR Shares Profit Equity
Date Symbol
1990-01-01 A 24 2 0 0 10000
1990-01-01 B 72 7 0 0 10000
1990-01-01 C 40 3.4 0 0 10000
1990-01-02 A 21 1.5 133 2793 17053
1990-01-02 B 65 6 33 2145 17053
1990-01-02 C 45 4.2 47 2115 17053
1990-01-03 A 19 2.5 136 2584 26885
1990-01-03 B 70 6.3 54 3780 26885
1990-01-03 C 51 5 68 3468 26885
I suppose I need a for loop
or a function
to be applied to each row.我想我需要一个
for loop
或一个应用于每一行的function
。 With these I have two issues.有了这些,我有两个问题。 One is that I'm not sure how I can create a
for loop
for this logic in case of a MultiIndex
dataframe.一个是我不确定如何在
MultiIndex
数据帧的情况下为此逻辑创建for loop
。 The second is that my dataframe is pretty large (something like 10 million rows) so I'm not sure if a for loop
would be a good idea.第二个是我的数据框非常大(大约 1000 万行),所以我不确定
for loop
是否是一个好主意。 But then how can I create these columns?但是我该如何创建这些列呢?
This solution can surely be cleaned up, but will produce your desired output.这个解决方案肯定可以清理,但会产生你想要的输出。 I've included your initial conditions in the construction of your sample dataframe:
我已经在示例数据框的构建中包含了您的初始条件:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['1990-01-01','1990-01-01','1990-01-01','1990-01-02','1990-01-02','1990-01-02','1990-01-03','1990-01-03','1990-01-03'],
'Symbol': ['A','B','C','A','B','C','A','B','C'],
'Close': [24, 72, 40, 21, 65, 45, 19, 70, 51],
'ATR': [2, 7, 3.4, 1.5, 6, 4.2, 2.5, 6.3, 5],
'Shares': [0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'Profit': [0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
Gives:给出:
Date Symbol Close ATR Shares Profit
0 1990-01-01 A 24 2.0 0.0 0.0
1 1990-01-01 B 72 7.0 0.0 0.0
2 1990-01-01 C 40 3.4 0.0 0.0
3 1990-01-02 A 21 1.5 NaN NaN
4 1990-01-02 B 65 6.0 NaN NaN
5 1990-01-02 C 45 4.2 NaN NaN
6 1990-01-03 A 19 2.5 NaN NaN
7 1990-01-03 B 70 6.3 NaN NaN
8 1990-01-03 C 51 5.0 NaN NaN
Then use groupby()
with apply()
and track your Equity
globally.然后使用
groupby()
和apply()
并在全球范围内跟踪您的Equity
。 Took me a second to realize that the nature of this problem requires you to group on two separate columns individually ( Symbol
and Date
):我花了一秒钟才意识到这个问题的本质要求您分别对两个单独的列(
Symbol
和Date
)进行分组:
start = 10000
Equity = 10000
def calcs(x):
global Equity
if x.index[0]==0: return x #Skip first group
x['Shares'] = np.floor(Equity*0.02/x['ATR'])
x['Profit'] = x['Shares']*x['Close']
Equity += x['Profit'].sum()
return x
df = df.groupby('Date').apply(calcs)
df['Equity'] = df.groupby('Date')['Profit'].transform('sum')
df['Equity'] = df.groupby('Symbol')['Equity'].cumsum()+start
This yields:这产生:
Date Symbol Close ATR Shares Profit Equity
0 1990-01-01 A 24 2.0 0.0 0.0 10000.0
1 1990-01-01 B 72 7.0 0.0 0.0 10000.0
2 1990-01-01 C 40 3.4 0.0 0.0 10000.0
3 1990-01-02 A 21 1.5 133.0 2793.0 17053.0
4 1990-01-02 B 65 6.0 33.0 2145.0 17053.0
5 1990-01-02 C 45 4.2 47.0 2115.0 17053.0
6 1990-01-03 A 19 2.5 136.0 2584.0 26885.0
7 1990-01-03 B 70 6.3 54.0 3780.0 26885.0
8 1990-01-03 C 51 5.0 68.0 3468.0 26885.0
can you try using shift and groupby?你可以尝试使用 shift 和 groupby 吗? Once you have the value of the previous line, all columns operations are straight forward.
一旦获得前一行的值,所有列操作都是直接的。
table2['previous'] = table2['close'].groupby('symbol').shift(1)
table2
date symbol close atr previous
1990-01-01 A 24 2 NaN
B 72 7 NaN
C 40 3.4 NaN
1990-01-02 A 21 1.5 24
B 65 6 72
C 45 4.2 40
1990-01-03 A 19 2.5 21
B 70 6.3 65
C 51 5 45
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.