[英]Pandas: Apply rolling window on complex function (Hurst Exponent)
In a nutshell: I need to calculate the Hurst Exponent (HE) across a rolling window inside a pandas dataframe and assign the values to its own column.简而言之:我需要在 pandas dataframe 内的滚动 window 上计算赫斯特指数 (HE),并将这些值分配给它自己的列。
The HE function I use was lifted from here as it seemed more robust.我使用的 HE function 是从这里取出的,因为它看起来更坚固。 For convenience it's posted below:为方便起见,贴在下面:
def HurstEXP( ts = [ None, ] ):
# TESTED: HurstEXP() Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
""" __doc__
USAGE:
HurstEXP( ts = [ None, ] )
Returns the Hurst Exponent of the time series vector ts[]
PARAMETERS:
ts[,] a time-series, with 100+ elements
( or [ None, ] that produces a demo run )
RETURNS:
float - a Hurst Exponent approximation,
as a real value
or
an explanatory string on an empty call
THROWS:
n/a
EXAMPLE:
>>> HurstEXP() # actual numbers will vary, as per np.random.randn() generator used
HurstEXP( Geometric Browian Motion ): 0.49447454
HurstEXP( Mean-Reverting Series ): -0.00016013
HurstEXP( Trending Series ): 0.95748937
'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'
>>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
REF.s:
>>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
"""
#---------------------------------------------------------------------------------------------------------------------------<self-reflective>
if ( ts[0] == None ): # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:
gbm = np.log( 1000 + np.cumsum( np.random.randn( 100000 ) ) ) # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log( 1000 + rand + ... )]
mr = np.log( 1000 + np.random.randn( 100000 ) ) # a Mean-Reverting Series [log(1000 + rand), log(1000 + rand ), log(1000 + rand ),... log( 1000 + rand )]
tr = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) ) # a Trending Series [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]
# Output the Hurst Exponent for each of the above SYNTH series
print ( "HurstEXP( Geometric Browian Motion ): {0: > 12.8f}".format( HurstEXP( gbm ) ) )
print ( "HurstEXP( Mean-Reverting Series ): {0: > 12.8f}".format( HurstEXP( mr ) ) )
print ( "HurstEXP( Trending Series ): {0: > 12.8f}".format( HurstEXP( tr ) ) )
return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
""" # FIX:
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
pass; too_short_list = 101 - len( ts ) # MUST HAVE 101+ ELEMENTS
if ( 0 < too_short_list ): # IF NOT:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
#---------------------------------------------------------------------------------------------------------------------------
lags = range( 2, 100 ) # Create the range of lag values
tau = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ] # Calculate the array of the variances of the lagged differences
#oly = np.polyfit( np.log( lags ), np.log( tau ), 1 ) # Use a linear fit to estimate the Hurst Exponent
#eturn ( 2.0 * poly[0] ) # Return the Hurst exponent from the polyfit output
""" ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471879418359915
|
|
|# DATA:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
memmap([ 1763.31005859, 1765.01000977, 1765.44995117, 1764.80004883,
1765.83996582, 1768.91003418, 1771.04003906, 1769.43994141,
1771.4699707 , 1771.61999512, 1774.76000977, 1769.55004883,
1773.4699707 , 1773.32995605, 1770.08996582, 1770.20996094,
1768.34997559, 1768.02001953, 1767.59997559, 1767.23999023,
1768.41003418, 1769.06994629, 1769.56994629, 1770.7800293 ,
1770.56994629, 1769.7800293 , 1769.90002441, 1770.44995117,
1770.9699707 , 1771.04003906, 1771.16003418, 1769.81005859,
1768.76000977, 1769.39001465, 1773.23999023, 1771.91003418,
1766.92004395, 1765.56994629, 1762.65002441, 1760.18005371,
1755. , 1756.67004395, 1753.48999023, 1753.7199707 ,
1751.92004395, 1745.44995117, 1745.44995117, 1744.54003906,
1744.54003906, 1744.84997559, 1744.84997559, 1744.34997559,
1744.34997559, 1743.75 , 1743.75 , 1745.23999023,
1745.23999023, 1745.15002441, 1745.31005859, 1745.47998047,
1745.47998047, 1749.06994629, 1749.06994629, 1748.29003906,
1748.29003906, 1747.42004395, 1747.42004395, 1746.98999023,
1747.61999512, 1748.79003906, 1748.79003906, 1748.38000488,
1748.38000488, 1744.81005859, 1744.81005859, 1736.80004883,
1736.80004883, 1735.43005371, 1735.43005371, 1737.9699707
], dtype=float32
)
|
|
| # CONVERTED .tolist() to avoid .memmap-type artifacts:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
[1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471876494884543
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )
Now in order to test the function let's grab some TSLA data from Yahoo finance:现在为了测试 function,让我们从雅虎财经获取一些 TSLA 数据:
import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta
years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
df = yf.download('TSLA',
start=lastyeartoday,
end=today,
progress=False)
df = df.dropna()
df = df[[u'Close']]
df
Output: Output:
Date Close
2016-02-16 31.034000
2016-02-17 33.736000
2016-02-18 33.354000
2016-02-19 33.316002
2016-02-22 35.548000
... ...
2021-02-08 863.419983
2021-02-09 849.460022
2021-02-10 804.820007
2021-02-11 811.659973
2021-02-12 816.119995
1259 rows × 1 columns
So far so good.到目前为止,一切都很好。 We have the function and the data to test it with.我们有 function 和测试它的数据。 Now let's do a sanity test, ie run the function against a subsample of the data:现在让我们做一个健全性测试,即针对数据的子样本运行 function:
import numpy as np
window = 20
hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)
Output: Output:
0.5163981260143369
Excellent.出色的。
Now to the meaty part.现在到肉的部分。 Applying the lambda across a rolling window and assigning the result to its own column.在滚动 window 上应用 lambda 并将结果分配给它自己的列。 I've pretty much tried every trick I was able to dig up but cannot make it work.我几乎尝试了我能够挖掘的每一个技巧,但无法让它发挥作用。
The vanilla approach:香草方法:
df.Close.rolling(window).apply(hurst, engine='cython', raw=True)
Gives me the following error:给我以下错误:
TypeError: Cannot convert input [[-31.0340004 -33.73600006 -33.35400009 -33.31600189 -35.54800034
-35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
-37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
-40.52000046 -41.74399948 -41.0359993 -41.5 -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp
Then I tried to get clever:然后我试着变得聪明:
hurst = df.apply(lambda x: pd.Series(x.index).rolling(window).agg({'Hurst': lambda window: HurstEXP(x.loc[window])})).reset_index()['Hurst']
df.assign(Hurst=hurst)
Also failed ignominiously.也惨遭失败。 So at this point - half a day later - I'm pretty much stumped.所以在这一点上 - 半天后 - 我几乎被难住了。 Do any of you hardcore python aficionados know of a way to do this?你们中的任何一个铁杆 python 爱好者都知道这样做的方法吗?
Thanks a lot in advance for any insights and pointers.非常感谢您提供任何见解和指示。
I think your problem is that your window is too short.我认为你的问题是你的 window 太短了。 It says in the docstring that the window length has to be 100+ elements, and the Hurst code isn't handling it properly, resulting in a failure of the SVD.它在文档字符串中说 window 长度必须是 100+ 个元素,并且 Hurst 代码没有正确处理它,导致 SVD 失败。
Separately, your test is actually slicing everything but the last 20 elements, so is actually a long array, which is why it didn't fail:另外,您的测试实际上是对除最后 20 个元素之外的所有内容进行切片,因此实际上是一个长数组,这就是它没有失败的原因:
tmp = df[u'Close'][:-20].to_numpy()
print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368
If you test a window < 100 length, it throws a LinAlg exception:如果您测试 window < 100 长度,则会引发 LinAlg 异常:
tmp = df[u'Close'][:20].to_numpy()
print(tmp.shape, HurstEXP(ts = tmp))
(fails)
It should work if increase your rolling window length or repair the code in the Hurst function to pad out the array if it's too short.如果增加滚动 window 长度或修复 Hurst function 中的代码以填充数组(如果它太短),它应该可以工作。
window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)
The code in the HurstEXP
function for handling lists shorter than 100 elements won't work for values of ts
that are np.ndarray
objects like those being provided from the .rolling(raw=True)
. HurstEXP
function 中用于处理少于 100 个元素的列表的代码不适用于 ts 的值,这些ts
是np.ndarray
对象,例如从.rolling(raw=True)
提供的对象。
You could modify the function to start with the following, and it will work for windows under 100 elements:您可以修改 function 以从以下开始,它将适用于 100 个元素以下的 windows:
def HurstEXP( ts= [ None, ] ):
if isinstance(ts, np.ndarray):
ts = ts.tolist()
...alternatively, if you're always going to have numpy arrays, you could change the line that fixes it: ...或者,如果您总是要使用 numpy arrays,您可以更改修复它的行:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
to至
ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.