I'm trying to do an anomaly detection using python, with a rolling window of 8 days value to calculate the interquartile range for the two metrics (err_precent, and fail precent). The examples provided seems only have one value per timestamp/index, where in my case I have many.
My data looks like this:
customerID err_precent fail_precent
end_date
2019-05-02 29616 0.857143 1.000000
2019-05-02 277023 1.000000 1.000000
2019-05-02 150560 1.000000 1.000000
2019-05-02 88778 1.000000 1.000000
... ... ... ...
2019-06-10 67311 1.000000 1.000000
2019-06-10 128116 1.000000 1.000000
2019-06-10 264288 0.935484 1.000000
2019-06-10 199984 0.941176 1.000000
2019-06-10 444105 0.952381 0.857143
2019-06-10 388703 0.894737 0.947368
2019-06-10 138986 1.000000 1.00000
After doing the rolling on the data columns, I can see there are many values for each day. The question is: can I use all the values per 8 days to calculate a single quantile value instead of having quantiles for each customer, like following?
err_precent fail_precent
end_date
2019-05-02 0.857143 1.000000
2019-05-03 0.900000 0.880000
2019-05-04 0.900000 0.880000
...
2019-06-10 0.857143 0.941176
df.index = pd.to_datetime(df.end_date, format='%m/%d/%Y')
df[dataColumn].rolling('8D', min_periods =1 ).quantile(.25, interpolation = 'lower')
The undesired result, as you can see there are many quantile values for each day is returned.
err_precent fail_precent
end_date
2019-05-02 0.857143 1.000000
2019-05-02 0.857143 1.000000
2019-05-02 0.857143 1.000000
2019-05-02 0.857143 1.000000
2019-05-02 1.000000 1.000000
2019-05-02 0.941176 1.000000
2019-05-02 0.941176 1.000000
2019-05-02 0.857143 0.941176
2019-05-02 0.923077 1.000
... ... ...
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
2019-06-10 0.900000 0.880000
我通过使用resample计算了这个问题的一个解决方法,首先通过重新采样计算每天的分位数,然后在前8天进行滚动平均。
lb = df[dataColumn].resample("1d").quantile(.25).fillna(0).rolling(window=8).mean()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.