简体   繁体   中英

pandas rolling with multiple values per time step

I'm trying to do an anomaly detection using python, with a rolling window of 8 days value to calculate the interquartile range for the two metrics (err_precent, and fail precent). The examples provided seems only have one value per timestamp/index, where in my case I have many.

My data looks like this:


        customerID   err_precent   fail_precent
end_date            
2019-05-02  29616   0.857143    1.000000
2019-05-02  277023  1.000000    1.000000
2019-05-02  150560  1.000000    1.000000
2019-05-02  88778   1.000000    1.000000
... ... ... ...
2019-06-10  67311   1.000000    1.000000
2019-06-10  128116  1.000000    1.000000
2019-06-10  264288  0.935484    1.000000
2019-06-10  199984  0.941176    1.000000
2019-06-10  444105  0.952381    0.857143
2019-06-10  388703  0.894737    0.947368
2019-06-10  138986  1.000000    1.00000

After doing the rolling on the data columns, I can see there are many values for each day. The question is: can I use all the values per 8 days to calculate a single quantile value instead of having quantiles for each customer, like following?


         err_precent    fail_precent
end_date        
2019-05-02  0.857143    1.000000
2019-05-03  0.900000    0.880000
2019-05-04  0.900000    0.880000
...
2019-06-10  0.857143    0.941176
df.index = pd.to_datetime(df.end_date, format='%m/%d/%Y')
df[dataColumn].rolling('8D', min_periods =1 ).quantile(.25, interpolation = 'lower')

The undesired result, as you can see there are many quantile values for each day is returned.


          err_precent   fail_precent
end_date        
2019-05-02  0.857143    1.000000
2019-05-02  0.857143    1.000000
2019-05-02  0.857143    1.000000
2019-05-02  0.857143    1.000000
2019-05-02  1.000000    1.000000
2019-05-02  0.941176    1.000000
2019-05-02  0.941176    1.000000
2019-05-02  0.857143    0.941176
2019-05-02  0.923077    1.000
... ... ...
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000
2019-06-10  0.900000    0.880000

我通过使用resample计算了这个问题的一个解决方法,首先通过重新采样计算每天的分位数,然后在前8天进行滚动平均。

lb = df[dataColumn].resample("1d").quantile(.25).fillna(0).rolling(window=8).mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM