![](/img/trans.png)
[英]Pandas: Apply rolling window on complex function (Hurst Exponent)
[英]How to apply the Hurst Exponent in Python in a rolling window
我正在尝试在滚动 window 上应用 SPY 收盘价的赫斯特指数。下面的代码(我从这里得到的: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing ) 如果我将它应用于收盘价列,效果很好。 然而,这给了我一个 static 值。 考虑到最后 200 个收盘价,我想在滚动 window 上应用赫斯特指数。 我的目标是获得一个列,其中考虑到最后 200 个收盘价,在每一行中更新 Hurst 指数。
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
import pandas_datareader as dr
from datetime import date
df = dr.data.get_data_yahoo('SPY',start='23-01-1991',end=date.today())
def hurst(ts):
"""Returns the Hurst Exponent of the time series vector ts"""
# Create the range of lag values
lags = range(2, 100)
# Calculate the array of the variances of the lagged differences
tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
# Use a linear fit to estimate the Hurst Exponent
poly = polyfit(log(lags), log(tau), 1)
# Return the Hurst exponent from the polyfit output
return poly[0]*2.0
print ("Hurst(SPY): %s" % hurst(df['Close']))
## I've tried the next lines of code but unfortunately they are not working:
df['Hurst_Column']= [0]
for aRowINDEX in range( 1, 200 ):
df['Hurst_Column'][-aRowINDEX] = hurst (df[u'Close'][:-aRowINDEX])
我是 Python 的新手,我尝试过不同的东西但没有运气。 有人可以帮我吗? 任何帮助都将非常受欢迎。 谢谢!
让我为您提供两步前进的方法:
第 1 步:使用测试数据实现更健壮的 Hurst 指数
第 2 步:产生类似“滑动窗口”的计算的简单方法
第 3 步:稍微复杂一点的方法 - 如果必须使用 ROLLING WINDOW ...
奖励:我应该在我的问题代码下写什么来完成它?
在这里,我将按原样发布来自QuantFX
模块的函数实现(Py2.7 在大多数地方不会造成麻烦,但是任何xrange()
都应该在 Py3.x 中替换为range()
)。
此代码包含一些改进和某种自我修复,如果测试表明数据段存在问题( QuantFX
使用时间自然流的约定,其中data[0]
是“最旧”时间-series 单元格和data[-1]
是“最新”的一个)。
不带任何参数调用HurstEXP()
将产生一个演示运行,显示一些测试和主题的解释。
print( HurstEXP.__doc__ )
也是不言自明的:
def HurstEXP( ts = [ None, ] ): # TESTED: HurstEXP() Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
""" __doc__
USAGE:
HurstEXP( ts = [ None, ] )
Returns the Hurst Exponent of the time series vector ts[]
PARAMETERS:
ts[,] a time-series, with 100+ elements
( or [ None, ] that produces a demo run )
RETURNS:
float - a Hurst Exponent approximation,
as a real value
or
an explanatory string on an empty call
THROWS:
n/a
EXAMPLE:
>>> HurstEXP() # actual numbers will vary, as per np.random.randn() generator used
HurstEXP( Geometric Browian Motion ): 0.49447454
HurstEXP( Mean-Reverting Series ): -0.00016013
HurstEXP( Trending Series ): 0.95748937
'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'
>>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
REF.s:
>>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
"""
#---------------------------------------------------------------------------------------------------------------------------<self-reflective>
if ( ts[0] == None ): # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:
gbm = np.log( 1000 + np.cumsum( np.random.randn( 100000 ) ) ) # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log( 1000 + rand + ... )]
mr = np.log( 1000 + np.random.randn( 100000 ) ) # a Mean-Reverting Series [log(1000 + rand), log(1000 + rand ), log(1000 + rand ),... log( 1000 + rand )]
tr = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) ) # a Trending Series [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]
# Output the Hurst Exponent for each of the above SYNTH series
print ( "HurstEXP( Geometric Browian Motion ): {0: > 12.8f}".format( HurstEXP( gbm ) ) )
print ( "HurstEXP( Mean-Reverting Series ): {0: > 12.8f}".format( HurstEXP( mr ) ) )
print ( "HurstEXP( Trending Series ): {0: > 12.8f}".format( HurstEXP( tr ) ) )
return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
""" # FIX:
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
pass; too_short_list = 101 - len( ts ) # MUST HAVE 101+ ELEMENTS
if ( 0 < too_short_list ): # IF NOT:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
#---------------------------------------------------------------------------------------------------------------------------
lags = range( 2, 100 ) # Create the range of lag values
tau = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ] # Calculate the array of the variances of the lagged differences
#oly = np.polyfit( np.log( lags ), np.log( tau ), 1 ) # Use a linear fit to estimate the Hurst Exponent
#eturn ( 2.0 * poly[0] ) # Return the Hurst exponent from the polyfit output
""" ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471879418359915
|
|
|# DATA:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
memmap([ 1763.31005859, 1765.01000977, 1765.44995117, 1764.80004883,
1765.83996582, 1768.91003418, 1771.04003906, 1769.43994141,
1771.4699707 , 1771.61999512, 1774.76000977, 1769.55004883,
1773.4699707 , 1773.32995605, 1770.08996582, 1770.20996094,
1768.34997559, 1768.02001953, 1767.59997559, 1767.23999023,
1768.41003418, 1769.06994629, 1769.56994629, 1770.7800293 ,
1770.56994629, 1769.7800293 , 1769.90002441, 1770.44995117,
1770.9699707 , 1771.04003906, 1771.16003418, 1769.81005859,
1768.76000977, 1769.39001465, 1773.23999023, 1771.91003418,
1766.92004395, 1765.56994629, 1762.65002441, 1760.18005371,
1755. , 1756.67004395, 1753.48999023, 1753.7199707 ,
1751.92004395, 1745.44995117, 1745.44995117, 1744.54003906,
1744.54003906, 1744.84997559, 1744.84997559, 1744.34997559,
1744.34997559, 1743.75 , 1743.75 , 1745.23999023,
1745.23999023, 1745.15002441, 1745.31005859, 1745.47998047,
1745.47998047, 1749.06994629, 1749.06994629, 1748.29003906,
1748.29003906, 1747.42004395, 1747.42004395, 1746.98999023,
1747.61999512, 1748.79003906, 1748.79003906, 1748.38000488,
1748.38000488, 1744.81005859, 1744.81005859, 1736.80004883,
1736.80004883, 1735.43005371, 1735.43005371, 1737.9699707
], dtype=float32
)
|
|
| # CONVERTED .tolist() to avoid .memmap-type artifacts:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
[1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471876494884543
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
[ ( -i, HurstEXP( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] # should call the HurstEXP for the last 200 days
>>> df[u'Close']
Date
1993-01-29 43.937500
1993-02-01 44.250000
...
2019-07-17 297.739990
2019-07-18 297.429993
Name: Close, Length: 6665, dtype: float64
>>>
>>> [ ( -i,
HurstEXP( df[u'Close'][:-i] )
) for i in range( 1, 10 )
]
[ ( -1, 0.4489364467179827 ),
( -2, 0.4489306967683502 ),
( -3, 0.44892205577752986 ),
( -4, 0.448931424819551 ),
( -5, 0.44895272101162326 ),
( -6, 0.44896713741862954 ),
( -7, 0.44898211557287204 ),
( -8, 0.4489941656580211 ),
( -9, 0.4490116318052649 )
]
虽然没有多少内存/处理效率,但“滚动窗口”技巧可能会被注入游戏,而没有内存,这样做的处理效率收益越少(你在语法上合理的代码上花费了很多,但处理效率这样做并没有带来任何好处,因为HurstEXP()
的卷积性质无济于事,如果不尝试重新矢量化其内部代码(为什么和永远是什么?),这会更好......只是如果教授或老板仍然希望你这样做......):
def rolling_window( aMatrix, aRollingWindowLENGTH ): #
""" __doc__
USAGE: rolling_window( aMatrix, aRollingWindowLENGTH )
PARAMS: aMatrix a numpy array
aRollingWindowLENGTH a LENGTH of a rolling window
RETURNS: a stride_trick'ed numpy array with rolling windows
THROWS: n/a
EXAMPLE: >>> x = np.arange( 10 ).reshape( ( 2, 5 ) )
>>> rolling_window( x, 3 )
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
>>> np.mean( rolling_window( x, 3 ), -1 )
array([[ 1., 2., 3.],
[ 6., 7., 8.]])
"""
new_shape = aMatrix.shape[:-1] + ( aMatrix.shape[-1] - aRollingWindowLENGTH + 1, aRollingWindowLENGTH )
new_strides = aMatrix.strides + ( aMatrix.strides[-1], )
return np.lib.stride_tricks.as_strided( aMatrix,
shape = new_shape,
strides = new_strides
)
>>> rolling_window( df[u'Close'], 100 ).shape
(6566, 100)
>>> rolling_window( df[u'Close'], 100 ).flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False <---------------- a VIEW, not a replica
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
你得到一个包含 6566 个向量的数组,其中包含“rolling_window”-ed 100 天的 SPY[Close]-s 块
>>> rolling_window( df[u'Close'], 100 )
array([[ 43.9375 , 44.25 , 44.34375 , ..., 44.5 , 44.59375 , 44.625 ],
[ 44.25 , 44.34375 , 44.8125 , ..., 44.59375 , 44.625 , 44.21875 ],
[ 44.34375 , 44.8125 , 45. , ..., 44.625 , 44.21875 , 44.8125 ],
...,
[279.14001465, 279.51998901, 279.32000732, ..., 300.6499939 , 300.75 , 299.77999878],
[279.51998901, 279.32000732, 279.20001221, ..., 300.75 , 299.77999878, 297.73999023],
[279.32000732, 279.20001221, 278.67999268, ..., 299.77999878, 297.73999023, 297.42999268]])
问:我应该在我的问题代码下写什么来完成它?
for aRowINDEX in range( 1, 200 ):
df[u'HurstEXP_COLUMN'][-aRowINDEX] = HurstEXP( df[u'Close'][:-aRowINDEX] )
print( "[{0:>4d}]: DIFF( hurst() - HurstEXP() ) == {1:}".format( aRowINDEX,
( hurst( df[u'Close'][:-aRowINDEX] )
- HurstEXP( df[u'Close'][:-aRowINDEX] )
)
)
有点晚了,但似乎没有一个干净的在线解决方案。 这似乎对我有用 - 你需要先安装 hurst 包。
from hurst import compute_Hc
H = lambda x: compute_Hc(x)[0]
window = 200
df['Hurst_Columns'] = df['Close'].rolling(window).apply(H)
稍后...
start = 200
# Get df length
samples = len(df)
for i in range(start,samples):
end = i + start
if end <= samples:
ts = df.iloc[i:end]
hurst = HurstEXP(ts['Close'].values)
print(ts['Date'].loc[end-1], ",", ts['Time'].loc[end-1], ",", hurst, file=h_file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.