[英]pandas rolling_quantile bug?
i recently bumped an unexpected issue with pandas rolling funcs. 我最近碰到了熊猫滚动功能的意外问题。 rolling_quantile for example:
以rolling_quantile为例:
>> row = 10
>> col = 5
>> idx = pd.date_range(20100101,periods=row,freq='B')
>> a = pd.DataFrame(np.random.rand(row*col).reshape((row,-1)),index=idx)
>> a
0 1 2 3 4
2010-01-01 0.341434 0.497274 0.596341 0.259909 0.872207
2010-01-04 0.222653 0.056723 0.064019 0.936307 0.785647
2010-01-05 0.179067 0.647165 0.931266 0.557698 0.713282
2010-01-06 0.049766 0.259756 0.945736 0.380948 0.282667
2010-01-07 0.385036 0.517609 0.575958 0.050758 0.850735
2010-01-08 0.628169 0.510453 0.325973 0.263361 0.444959
2010-01-11 0.099133 0.976571 0.602235 0.181185 0.506316
2010-01-12 0.987344 0.902289 0.080000 0.254695 0.753325
2010-01-13 0.759198 0.014548 0.139858 0.822900 0.251972
2010-01-14 0.404149 0.349788 0.038714 0.280568 0.197865
>> a.quantile([0.25,0.5,0.75],axis=0)
0 1 2 3 4
0.25 0.189963 0.282264 0.094964 0.255999 0.323240
0.50 0.363235 0.503864 0.450966 0.271964 0.609799
0.75 0.572164 0.614776 0.600761 0.513510 0.777567
>> np.percentile(a,[25,50,75],axis=0)
[array([ 0.18996316, 0.28226404, 0.09496441, 0.25599853, 0.32323997]),
array([ 0.36323529, 0.50386356, 0.45096554, 0.27196429, 0.60979881]),
array([ 0.57216415, 0.61477607, 0.6007611 , 0.51351021, 0.7775667 ])]
>> pd.rolling_quantile(a,row,0.25).tail(1)
0 1 2 3 4
2010-01-14 0.179067 0.259756 0.08 0.254695 0.282667
looks like pandas.DataFrame.quantile member func is consistent with the numpy.percentile func. 看起来像pandas.DataFrame.quantile成员func与numpy.percentile func一致。 however the pandas.rolling_quantile func returns diff results.
但是pandas.rolling_quantile func返回diff结果。 reduce the row number to 5, the problem will be gone (all three methods return the same results).
将行号减少到5,问题就会消失(所有三种方法都返回相同的结果)。 any thoughts?
有什么想法吗?
ps: i also tested rolling_std func which will "random" generate error with 10^-7 ~ 10^-8 scales for long (row-wise) pandas.DataFrames ps:我还测试了rolling_std func,它将“随机”生成10 ^ -7~10 ^ -8比例的错误,用于长(行)pandas.DataFrames
python environment: python环境:
As described here , the problem seems to be that the rolling_quantile()
function (now in pandas 0.18 is rolling().quantile()
) does not interpolate, it simply uses the nearest point. 如上所述这里 ,这个问题似乎是在
rolling_quantile()
函数(今熊猫0.18被rolling().quantile()
不进行内插,它只是使用的最近点。
A workaround is to apply the numpy percentile function after rolling: 解决方法是在滚动后应用numpy百分位函数:
a.rolling(row).apply(func=np.percentile, args=(25,)).tail(1)
which gives the correct interpolated results. 它给出了正确的插值结果。
This has be fixed in pandas 0.21.0. 这已在pandas 0.21.0中修复。 I just tried it.
我刚尝试过。 BTW, 0.20.3 hasn't fixed it.
BTW,0.20.3还没有解决它。 The fix is here: https://github.com/pandas-dev/pandas/pull/16247
修复程序在这里: https : //github.com/pandas-dev/pandas/pull/16247
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.