简体   繁体   中英

Issue with pandas exponentially weighted moving average

I've ran into a small issue trying to calculate exponentially weighted moving average (ewma) with pandas:

Here, I'm using pandas' function to calculate the 9-period ewma of my price:

df = m1["open"].ewm(min_periods=9,span=9).mean()

The formula used to calculated an ewma is the following:

ewm(t+1) = alpha * price + (1-alpha) * ewm(t) 

with alpha = 2/(period+1)

The results the pandas' function gave me didn't seem correct, so I tried to verify it:

df = m1["open"].ewm(min_periods=9,span=9).mean()
alpha = 2/(9+1)
df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)
bol_array = df == df_bis

df being the DataFrame returned by the pandas' function, df_bis the one using the formula using the data from df and the price.

bol_array should always be true, as both df and df_bis should be equal.

However, it's not the case, here I took a random part of bol_array :

2015-01-09 21:32:00    False
2015-01-09 21:33:00    False
2015-01-09 21:34:00     True
2015-01-09 21:35:00    False
2015-01-09 21:36:00    False
2015-01-09 21:37:00    False
2015-01-09 21:38:00    False
2015-01-09 21:39:00    False
2015-01-09 21:40:00    False
2015-01-09 21:41:00    False
2015-01-09 21:42:00    False
2015-01-09 21:43:00     True
2015-01-09 21:44:00    False
2015-01-09 21:45:00    False

Sometimes, they are equal, sometimes not.

I checked at a specific time to see if it might be a rounding problem:

m1["open"][15]*alpha + df[15]*(1-alpha)
Out[7]: 1.1931468623934722

df[16]
Out[8]: 1.1930652375329887

The result are really different, it's not a rounding problem (I need a precision of 5 digits).

Would anyone know what's the problem here? I can't seem to find what the issue is here.

Edit: To see if it might be a rounding problem, I'm adding an array mesuring the difference between pandas'

diff_array = df-df_bis

I need a precision of 5 digits, so I'm multiplying this array by 10^5 to visualize better the magnitude of the difference:

diff_array*10**5
Out[16]: 
DateTime
2015-01-04 22:00:00          NaN
2015-01-04 22:01:00          NaN
2015-01-04 22:02:00          NaN
2015-01-04 22:03:00          NaN
2015-01-04 22:04:00          NaN
2015-01-04 22:05:00          NaN
2015-01-04 22:06:00          NaN
2015-01-04 22:07:00          NaN
2015-01-04 22:08:00          NaN
2015-01-04 22:09:00    -2.610024
2015-01-04 22:10:00   -60.325142
2015-01-04 22:11:00    27.044649
2015-01-04 22:12:00    32.072310
2015-01-04 22:13:00   -25.944314
2015-01-04 22:14:00     8.201273
2015-01-04 22:15:00     8.358418
2015-01-04 22:16:00    -8.162486
2015-01-04 22:17:00   -11.381701
2015-01-04 22:18:00     8.610419
2015-01-04 22:19:00     4.862610
2015-01-04 22:20:00     3.875236
2015-01-04 22:21:00     1.659843

The magnitude of the difference is very huge sometimes. Even though I'm looking for a precision of 5 digits, it could be okay for me to have a difference of 10^(-5), or twice that, but I have a difference of up to 60*(10^-5) just among the first values, which is way too huge.

Edit 2: I looked at every parameter the emw, I get the same result specifing directly the alpha parameter corresponding to an 9-period ewma, meaning that my parameter is indeed correct.

The adjust parameter is by default set to True, looking at pandas documentation:

When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i].

I turned the adjust parameter to False, which should have resolved the problem:

df = m1["open"].ewm(min_periods=9,alpha=0.2,adjust=False).mean()
alpha = 2/(9+1)
df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)
diff_array = df-df_bis

diff_array*10**5
Out[11]: 
DateTime
2015-01-04 22:09:00    -5.6
2015-01-04 22:10:00   -56.8
2015-01-04 22:11:00    27.2
2015-01-04 22:12:00    30.4
2015-01-04 22:13:00   -25.8
2015-01-04 22:14:00     8.0
2015-01-04 22:15:00     8.0
2015-01-04 22:16:00    -8.2
2015-01-04 22:17:00   -11.2
2015-01-04 22:18:00     8.6
2015-01-04 22:19:00     4.8

But it doesn't solve my issue here. I can't find of anything that would explain such disparities, would anyone have a clue on why there's a difference?

Here's the documentation from pandas on the subject, where the formulas I used earlier come from: http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows

The data I'm using is really big, here's a sample:

m1["open"]
Out[10]: 
DateTime
2015-01-04 22:00:00    1.19548
2015-01-04 22:01:00    1.19388
2015-01-04 22:02:00    1.19102
2015-01-04 22:03:00    1.18826
2015-01-04 22:04:00    1.19085
2015-01-04 22:05:00    1.19257
2015-01-04 22:06:00    1.19270
2015-01-04 22:07:00    1.19350
2015-01-04 22:08:00    1.19427
2015-01-04 22:09:00    1.19399
2015-01-04 22:10:00    1.19115
2015-01-04 22:11:00    1.19251
2015-01-04 22:12:00    1.19403
2015-01-04 22:13:00    1.19274
2015-01-04 22:14:00    1.19314
2015-01-04 22:15:00    1.19354
2015-01-04 22:16:00    1.19313
2015-01-04 22:17:00    1.19257
2015-01-04 22:18:00    1.19300
2015-01-04 22:19:00    1.19324
2015-01-04 22:20:00    1.19343
2015-01-04 22:21:00    1.19351
2015-01-04 22:22:00    1.19353
2015-01-04 22:23:00    1.19376
2015-01-04 22:24:00    1.19408
2015-01-04 22:25:00    1.19370
2015-01-04 22:26:00    1.19381
2015-01-04 22:27:00    1.19439
2015-01-04 22:28:00    1.19435
2015-01-04 22:29:00    1.19419

Solved the issue, I'm just an idiot.

I used the formula: df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)

But the price shouldn't be shifted here, the correct calculation is:

df_bis = alpha*m1["open"] + (1-alpha)*df.shift(1)

Which yields:

diff_array
Out[24]: 
DateTime
2015-01-04 22:09:00    0.0
2015-01-04 22:10:00    0.0
2015-01-04 22:11:00    0.0
2015-01-04 22:12:00    0.0
2015-01-04 22:13:00    0.0
2015-01-04 22:14:00    0.0
2015-01-04 22:15:00    0.0
2015-01-04 22:16:00    0.0
2015-01-04 22:17:00    0.0
2015-01-04 22:18:00    0.0
2015-01-04 22:19:00    0.0
2015-01-04 22:20:00    0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM