简体   繁体   English

当window = 1时,pandas滚动中可能出现的错误

[英]Possible bug in pandas rolling mean when window = 1

In order to have a more generic notation in my code, I want to express my original time series as a moving average over 1 period. 为了在我的代码中使用更通用的表示法,我想将原始时间序列表示为1个时段的移动平均值。 Quite unexpectedly, using pandas pd.rolling_mean function, the two are not exactly the same: 非常出乎意料的是,使用pandas pd.rolling_mean函数,两者并不完全相同:

import pandas as pd
import numpy as np

np.random.seed(1)

ts = pd.Series(np.random.rand(1000))

mavg = pd.rolling_mean(ts, 1)

(ts - mavg).describe()
Out[120]: 
count    1.000000e+03
mean     6.284973e-16
std      3.877250e-16
min     -3.330669e-16
25%      3.330669e-16
50%      5.551115e-16
75%      8.881784e-16
max      1.554312e-15
dtype: float64

any((ts - mavg).dropna()>0)
Out[121]: True

Should this be considered a bug or am I missing something? 这应该被视为一个错误还是我错过了什么?

The numbers are very small and well in the range of numerical "noise" caused by how floats work. 这些数字非常小,并且在浮点数如何工作引起的数值“噪声”范围内。 Floats cannot represent all numbers exactly. 浮点数不能完全代表所有数字。 Therefore you will often have small "residuals" left when doing calculations with floats. 因此,在使用浮点数进行计算时,通常会留下较小的“残差”。 Check against a small epsilon: 检查一个小epsilon:

>>> any((ts - mavg).dropna().abs() > 1e-14)
False

The difference comes from the floating point calculations. 差异来自浮点计算。 Floats are not exactly the same when you do calculations due to the way how they are represented internally. 由于内部表示它们的方式,计算时浮点数并不完全相同。 Within these "rounding errors" your numbers are identical. 在这些“舍入错误”中,您的数字是相同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM