简体   繁体   English

Pandas:在复合体 function(赫斯特指数)上应用滚动 window

[英]Pandas: Apply rolling window on complex function (Hurst Exponent)

In a nutshell: I need to calculate the Hurst Exponent (HE) across a rolling window inside a pandas dataframe and assign the values to its own column.简而言之:我需要在 pandas dataframe 内的滚动 window 上计算赫斯特指数 (HE),并将这些值分配给它自己的列。

The HE function I use was lifted from here as it seemed more robust.我使用的 HE function 是从这里取出的,因为它看起来更坚固。 For convenience it's posted below:为方便起见,贴在下面:

def HurstEXP( ts = [ None, ] ):                                         
# TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
        """                                                         __doc__
        USAGE:
                    HurstEXP( ts = [ None, ] )

                    Returns the Hurst Exponent of the time series vector ts[]

        PARAMETERS:
                    ts[,]   a time-series, with 100+ elements
                            ( or [ None, ] that produces a demo run )

        RETURNS:
                    float - a Hurst Exponent approximation,
                            as a real value
                            or
                            an explanatory string on an empty call
        THROWS:
                    n/a
        EXAMPLE:
                    >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                    HurstEXP( Geometric Browian Motion ):    0.49447454
                    HurstEXP(    Mean-Reverting Series ):   -0.00016013
                    HurstEXP(          Trending Series ):    0.95748937
                    'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                    >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
        REF.s:
                    >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
        """
        #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
        if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

             gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
             mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
             tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                    # Output the Hurst Exponent for each of the above SYNTH series
             print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
             print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
             print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

             return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
        """                                                         # FIX:
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
        if ( 0 <  too_short_list ):                                 # IF NOT:
             ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
        #---------------------------------------------------------------------------------------------------------------------------
        lags = range( 2, 100 )                                                              # Create the range of lag values
        tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
        #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
        #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
        """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
          warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
          arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
          ret, rcount, out=ret, casting='unsafe', subok=False)
        QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
          return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471879418359915
        |
        |
        |# DATA:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
        memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                 1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                 1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                 1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                 1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                 1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                 1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                 1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                 1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                 1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                 1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                 1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                 1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                 1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                 1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                 1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                 1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                 1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                 1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                 1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                 ], dtype=float32
                )
        |
        |
        | # CONVERTED .tolist() to avoid .memmap-type artifacts:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
        [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
        859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
          ret = ret.dtype.type(ret / rcount)

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471876494884543
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )  

Now in order to test the function let's grab some TSLA data from Yahoo finance:现在为了测试 function,让我们从雅虎财经获取一些 TSLA 数据:

import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta

years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')

df = yf.download('TSLA', 
                      start=lastyeartoday, 
                      end=today, 
                      progress=False)
df = df.dropna()
df = df[[u'Close']]
df

Output: Output:

Date        Close
2016-02-16  31.034000
2016-02-17  33.736000
2016-02-18  33.354000
2016-02-19  33.316002
2016-02-22  35.548000
... ...
2021-02-08  863.419983
2021-02-09  849.460022
2021-02-10  804.820007
2021-02-11  811.659973
2021-02-12  816.119995
1259 rows × 1 columns

So far so good.到目前为止,一切都很好。 We have the function and the data to test it with.我们有 function 和测试它的数据。 Now let's do a sanity test, ie run the function against a subsample of the data:现在让我们做一个健全性测试,即针对数据的子样本运行 function:

import numpy as np
window = 20

hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)

Output: Output:

0.5163981260143369

Excellent.出色的。

Now to the meaty part.现在到肉的部分。 Applying the lambda across a rolling window and assigning the result to its own column.在滚动 window 上应用 lambda 并将结果分配给它自己的列。 I've pretty much tried every trick I was able to dig up but cannot make it work.我几乎尝试了我能够挖掘的每一个技巧,但无法让它发挥作用。

The vanilla approach:香草方法:

df.Close.rolling(window).apply(hurst, engine='cython', raw=True)

Gives me the following error:给我以下错误:

TypeError: Cannot convert input [[-31.0340004  -33.73600006 -33.35400009 -33.31600189 -35.54800034
 -35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
 -37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
 -40.52000046 -41.74399948 -41.0359993  -41.5        -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp

Then I tried to get clever:然后我试着变得聪明:

hurst = df.apply(lambda x: pd.Series(x.index).rolling(window).agg({'Hurst': lambda window: HurstEXP(x.loc[window])})).reset_index()['Hurst']
df.assign(Hurst=hurst) 

Also failed ignominiously.也惨遭失败。 So at this point - half a day later - I'm pretty much stumped.所以在这一点上 - 半天后 - 我几乎被难住了。 Do any of you hardcore python aficionados know of a way to do this?你们中的任何一个铁杆 python 爱好者都知道这样做的方法吗?

Thanks a lot in advance for any insights and pointers.非常感谢您提供任何见解和指示。

I think your problem is that your window is too short.我认为你的问题是你的 window 太短了。 It says in the docstring that the window length has to be 100+ elements, and the Hurst code isn't handling it properly, resulting in a failure of the SVD.它在文档字符串中说 window 长度必须是 100+ 个元素,并且 Hurst 代码没有正确处理它,导致 SVD 失败。

Separately, your test is actually slicing everything but the last 20 elements, so is actually a long array, which is why it didn't fail:另外,您的测试实际上是对最后 20 个元素之外的所有内容进行切片,因此实际上是一个长数组,这就是它没有失败的原因:

tmp = df[u'Close'][:-20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368

If you test a window < 100 length, it throws a LinAlg exception:如果您测试 window < 100 长度,则会引发 LinAlg 异常:

tmp = df[u'Close'][:20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(fails)

It should work if increase your rolling window length or repair the code in the Hurst function to pad out the array if it's too short.如果增加滚动 window 长度或修复 Hurst function 中的代码以填充数组(如果它太短),它应该可以工作。

window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)

The code in the HurstEXP function for handling lists shorter than 100 elements won't work for values of ts that are np.ndarray objects like those being provided from the .rolling(raw=True) . HurstEXP function 中用于处理少于 100 个元素的列表的代码不适用于 ts 的值,这些tsnp.ndarray对象,例如从.rolling(raw=True)提供的对象。

You could modify the function to start with the following, and it will work for windows under 100 elements:您可以修改 function 以从以下开始,它将适用于 100 个元素以下的 windows:

def HurstEXP( ts= [ None, ] ):   
        if isinstance(ts, np.ndarray):
            ts = ts.tolist()
        

...alternatively, if you're always going to have numpy arrays, you could change the line that fixes it: ...或者,如果您总是要使用 numpy arrays,您可以更改修复它的行:

     ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD

to

     ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在滚动 window 中应用 Python 中的赫斯特指数 - How to apply the Hurst Exponent in Python in a rolling window 在滚动窗口上的pandas数据框上应用自定义功能 - apply custom function on pandas dataframe on a rolling window 在 pandas 轧制中应用定制 window function - Apply custom window function in pandas rolling 熊猫滚动应用功能到整个窗口数据框 - Pandas rolling apply function to entire window dataframe 熊猫:提高滚动窗口的速度(应用自定义功能) - Pandas: increase speed of rolling window (apply a custom function) 使用滚动 12 个月窗口将自定义函数应用于 Pandas Group - Apply Custom Function to Pandas Groupby using Rolling 12 Month Window 在 Pandas 中应用 EWMA 滚动 window function 但避免初始 NAN 值 - Apply a EWMA rolling window function in Pandas but avoid initial NAN values 如何使用days作为pandas rolling_apply函数的窗口 - How to use days as window for pandas rolling_apply function 如何应用 function 而不将数值返回给 pandas 滚动 Window? - How to apply a function not returning a numeric value to a pandas rolling Window? 如何将自定义 function 应用于 pandas 中滚动 window 的文本列? - How to apply custom function to a text column with a rolling window in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM