Pandas：在复合体 function（赫斯特指数）上应用滚动 window

Question

In a nutshell: I need to calculate the Hurst Exponent (HE) across a rolling window inside a pandas dataframe and assign the values to its own column.简而言之：我需要在 pandas dataframe 内的滚动 window 上计算赫斯特指数 (HE)，并将这些值分配给它自己的列。

The HE function I use was lifted from here as it seemed more robust.我使用的 HE function 是从这里取出的，因为它看起来更坚固。 For convenience it's posted below:为方便起见，贴在下面：

def HurstEXP( ts = [ None, ] ):                                         
# TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
        """                                                         __doc__
        USAGE:
                    HurstEXP( ts = [ None, ] )

                    Returns the Hurst Exponent of the time series vector ts[]

        PARAMETERS:
                    ts[,]   a time-series, with 100+ elements
                            ( or [ None, ] that produces a demo run )

        RETURNS:
                    float - a Hurst Exponent approximation,
                            as a real value
                            or
                            an explanatory string on an empty call
        THROWS:
                    n/a
        EXAMPLE:
                    >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                    HurstEXP( Geometric Browian Motion ):    0.49447454
                    HurstEXP(    Mean-Reverting Series ):   -0.00016013
                    HurstEXP(          Trending Series ):    0.95748937
                    'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                    >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
        REF.s:
                    >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
        """
        #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
        if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

             gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
             mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
             tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                    # Output the Hurst Exponent for each of the above SYNTH series
             print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
             print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
             print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

             return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
        """                                                         # FIX:
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
        if ( 0 <  too_short_list ):                                 # IF NOT:
             ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
        #---------------------------------------------------------------------------------------------------------------------------
        lags = range( 2, 100 )                                                              # Create the range of lag values
        tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
        #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
        #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
        """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
          warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
          arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
          ret, rcount, out=ret, casting='unsafe', subok=False)
        QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
          return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471879418359915
        |
        |
        |# DATA:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
        memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                 1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                 1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                 1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                 1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                 1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                 1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                 1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                 1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                 1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                 1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                 1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                 1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                 1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                 1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                 1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                 1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                 1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                 1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                 1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                 ], dtype=float32
                )
        |
        |
        | # CONVERTED .tolist() to avoid .memmap-type artifacts:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
        [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
        859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
          ret = ret.dtype.type(ret / rcount)

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471876494884543
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )

Now in order to test the function let's grab some TSLA data from Yahoo finance:现在为了测试 function，让我们从雅虎财经获取一些 TSLA 数据：

import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta

years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')

df = yf.download('TSLA', 
                      start=lastyeartoday, 
                      end=today, 
                      progress=False)
df = df.dropna()
df = df[[u'Close']]
df

Output: Output：

Date        Close
2016-02-16  31.034000
2016-02-17  33.736000
2016-02-18  33.354000
2016-02-19  33.316002
2016-02-22  35.548000
... ...
2021-02-08  863.419983
2021-02-09  849.460022
2021-02-10  804.820007
2021-02-11  811.659973
2021-02-12  816.119995
1259 rows × 1 columns

So far so good.到目前为止，一切都很好。 We have the function and the data to test it with.我们有 function 和测试它的数据。 Now let's do a sanity test, ie run the function against a subsample of the data:现在让我们做一个健全性测试，即针对数据的子样本运行 function：

import numpy as np
window = 20

hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)

Output: Output：

0.5163981260143369

Excellent.出色的。

Now to the meaty part.现在到肉的部分。 Applying the lambda across a rolling window and assigning the result to its own column.在滚动 window 上应用 lambda 并将结果分配给它自己的列。 I've pretty much tried every trick I was able to dig up but cannot make it work.我几乎尝试了我能够挖掘的每一个技巧，但无法让它发挥作用。

The vanilla approach:香草方法：

df.Close.rolling(window).apply(hurst, engine='cython', raw=True)

Gives me the following error:给我以下错误：

TypeError: Cannot convert input [[-31.0340004  -33.73600006 -33.35400009 -33.31600189 -35.54800034
 -35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
 -37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
 -40.52000046 -41.74399948 -41.0359993  -41.5        -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp

Then I tried to get clever:然后我试着变得聪明：

hurst = df.apply(lambda x: pd.Series(x.index).rolling(window).agg({'Hurst': lambda window: HurstEXP(x.loc[window])})).reset_index()['Hurst']
df.assign(Hurst=hurst)

Also failed ignominiously.也惨遭失败。 So at this point - half a day later - I'm pretty much stumped.所以在这一点上 - 半天后 - 我几乎被难住了。 Do any of you hardcore python aficionados know of a way to do this?你们中的任何一个铁杆 python 爱好者都知道这样做的方法吗？

Thanks a lot in advance for any insights and pointers.非常感谢您提供任何见解和指示。

Answer 1

I think your problem is that your window is too short.我认为你的问题是你的 window 太短了。 It says in the docstring that the window length has to be 100+ elements, and the Hurst code isn't handling it properly, resulting in a failure of the SVD.它在文档字符串中说 window 长度必须是 100+ 个元素，并且 Hurst 代码没有正确处理它，导致 SVD 失败。

Separately, your test is actually slicing everything but the last 20 elements, so is actually a long array, which is why it didn't fail:另外，您的测试实际上是对除最后 20 个元素之外的所有内容进行切片，因此实际上是一个长数组，这就是它没有失败的原因：

tmp = df[u'Close'][:-20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368

If you test a window < 100 length, it throws a LinAlg exception:如果您测试 window < 100 长度，则会引发 LinAlg 异常：

tmp = df[u'Close'][:20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(fails)

It should work if increase your rolling window length or repair the code in the Hurst function to pad out the array if it's too short.如果增加滚动 window 长度或修复 Hurst function 中的代码以填充数组（如果它太短），它应该可以工作。

window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)

The code in the HurstEXP function for handling lists shorter than 100 elements won't work for values of ts that are np.ndarray objects like those being provided from the .rolling(raw=True) . HurstEXP function 中用于处理少于 100 个元素的列表的代码不适用于 ts 的值，这些ts是np.ndarray对象，例如从.rolling(raw=True)提供的对象。

You could modify the function to start with the following, and it will work for windows under 100 elements:您可以修改 function 以从以下开始，它将适用于 100 个元素以下的 windows：

def HurstEXP( ts= [ None, ] ):   
        if isinstance(ts, np.ndarray):
            ts = ts.tolist()

...alternatively, if you're always going to have numpy arrays, you could change the line that fixes it: ...或者，如果您总是要使用 numpy arrays，您可以更改修复它的行：

     ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD

to至

     ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')

Pandas：在复合体 function（赫斯特指数）上应用滚动 window

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-15 17:17:38

Pandas：在复合体 function（赫斯特指数）上应用滚动 window

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-15 17:17:38

解决方案1
1 已采纳 2021-02-15 17:17:38