I've ran into a small issue trying to calculate exponentially weighted moving average (ewma) with pandas:
Here, I'm using pandas' function to calculate the 9-period ewma of my price:
df = m1["open"].ewm(min_periods=9,span=9).mean()
The formula used to calculated an ewma is the following:
ewm(t+1) = alpha * price + (1-alpha) * ewm(t)
with alpha = 2/(period+1)
The results the pandas' function gave me didn't seem correct, so I tried to verify it:
df = m1["open"].ewm(min_periods=9,span=9).mean()
alpha = 2/(9+1)
df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)
bol_array = df == df_bis
df
being the DataFrame returned by the pandas' function, df_bis
the one using the formula using the data from df and the price.
bol_array
should always be true, as both df
and df_bis
should be equal.
However, it's not the case, here I took a random part of bol_array
:
2015-01-09 21:32:00 False
2015-01-09 21:33:00 False
2015-01-09 21:34:00 True
2015-01-09 21:35:00 False
2015-01-09 21:36:00 False
2015-01-09 21:37:00 False
2015-01-09 21:38:00 False
2015-01-09 21:39:00 False
2015-01-09 21:40:00 False
2015-01-09 21:41:00 False
2015-01-09 21:42:00 False
2015-01-09 21:43:00 True
2015-01-09 21:44:00 False
2015-01-09 21:45:00 False
Sometimes, they are equal, sometimes not.
I checked at a specific time to see if it might be a rounding problem:
m1["open"][15]*alpha + df[15]*(1-alpha)
Out[7]: 1.1931468623934722
df[16]
Out[8]: 1.1930652375329887
The result are really different, it's not a rounding problem (I need a precision of 5 digits).
Would anyone know what's the problem here? I can't seem to find what the issue is here.
Edit: To see if it might be a rounding problem, I'm adding an array mesuring the difference between pandas'
diff_array = df-df_bis
I need a precision of 5 digits, so I'm multiplying this array by 10^5 to visualize better the magnitude of the difference:
diff_array*10**5
Out[16]:
DateTime
2015-01-04 22:00:00 NaN
2015-01-04 22:01:00 NaN
2015-01-04 22:02:00 NaN
2015-01-04 22:03:00 NaN
2015-01-04 22:04:00 NaN
2015-01-04 22:05:00 NaN
2015-01-04 22:06:00 NaN
2015-01-04 22:07:00 NaN
2015-01-04 22:08:00 NaN
2015-01-04 22:09:00 -2.610024
2015-01-04 22:10:00 -60.325142
2015-01-04 22:11:00 27.044649
2015-01-04 22:12:00 32.072310
2015-01-04 22:13:00 -25.944314
2015-01-04 22:14:00 8.201273
2015-01-04 22:15:00 8.358418
2015-01-04 22:16:00 -8.162486
2015-01-04 22:17:00 -11.381701
2015-01-04 22:18:00 8.610419
2015-01-04 22:19:00 4.862610
2015-01-04 22:20:00 3.875236
2015-01-04 22:21:00 1.659843
The magnitude of the difference is very huge sometimes. Even though I'm looking for a precision of 5 digits, it could be okay for me to have a difference of 10^(-5), or twice that, but I have a difference of up to 60*(10^-5) just among the first values, which is way too huge.
Edit 2: I looked at every parameter the emw, I get the same result specifing directly the alpha parameter corresponding to an 9-period ewma, meaning that my parameter is indeed correct.
The adjust parameter is by default set to True, looking at pandas documentation:
When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)
*
weighted_average[i-1] + alpha*
arg[i].
I turned the adjust parameter to False, which should have resolved the problem:
df = m1["open"].ewm(min_periods=9,alpha=0.2,adjust=False).mean()
alpha = 2/(9+1)
df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)
diff_array = df-df_bis
diff_array*10**5
Out[11]:
DateTime
2015-01-04 22:09:00 -5.6
2015-01-04 22:10:00 -56.8
2015-01-04 22:11:00 27.2
2015-01-04 22:12:00 30.4
2015-01-04 22:13:00 -25.8
2015-01-04 22:14:00 8.0
2015-01-04 22:15:00 8.0
2015-01-04 22:16:00 -8.2
2015-01-04 22:17:00 -11.2
2015-01-04 22:18:00 8.6
2015-01-04 22:19:00 4.8
But it doesn't solve my issue here. I can't find of anything that would explain such disparities, would anyone have a clue on why there's a difference?
Here's the documentation from pandas on the subject, where the formulas I used earlier come from: http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows
The data I'm using is really big, here's a sample:
m1["open"]
Out[10]:
DateTime
2015-01-04 22:00:00 1.19548
2015-01-04 22:01:00 1.19388
2015-01-04 22:02:00 1.19102
2015-01-04 22:03:00 1.18826
2015-01-04 22:04:00 1.19085
2015-01-04 22:05:00 1.19257
2015-01-04 22:06:00 1.19270
2015-01-04 22:07:00 1.19350
2015-01-04 22:08:00 1.19427
2015-01-04 22:09:00 1.19399
2015-01-04 22:10:00 1.19115
2015-01-04 22:11:00 1.19251
2015-01-04 22:12:00 1.19403
2015-01-04 22:13:00 1.19274
2015-01-04 22:14:00 1.19314
2015-01-04 22:15:00 1.19354
2015-01-04 22:16:00 1.19313
2015-01-04 22:17:00 1.19257
2015-01-04 22:18:00 1.19300
2015-01-04 22:19:00 1.19324
2015-01-04 22:20:00 1.19343
2015-01-04 22:21:00 1.19351
2015-01-04 22:22:00 1.19353
2015-01-04 22:23:00 1.19376
2015-01-04 22:24:00 1.19408
2015-01-04 22:25:00 1.19370
2015-01-04 22:26:00 1.19381
2015-01-04 22:27:00 1.19439
2015-01-04 22:28:00 1.19435
2015-01-04 22:29:00 1.19419
Solved the issue, I'm just an idiot.
I used the formula: df_bis = alpha*m1["open"].shift(1) + (1-alpha)*df.shift(1)
But the price shouldn't be shifted here, the correct calculation is:
df_bis = alpha*m1["open"] + (1-alpha)*df.shift(1)
Which yields:
diff_array
Out[24]:
DateTime
2015-01-04 22:09:00 0.0
2015-01-04 22:10:00 0.0
2015-01-04 22:11:00 0.0
2015-01-04 22:12:00 0.0
2015-01-04 22:13:00 0.0
2015-01-04 22:14:00 0.0
2015-01-04 22:15:00 0.0
2015-01-04 22:16:00 0.0
2015-01-04 22:17:00 0.0
2015-01-04 22:18:00 0.0
2015-01-04 22:19:00 0.0
2015-01-04 22:20:00 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.