简体   繁体   中英

Pandas: moving sum over time window with condition

I have a data frame that contains boolean failures (with a timestamp) for specific machines. I'd like to add a column that performs a moving sum, for that specific machine, of all failures in a specific time frame relative to the timestamp. For example, calculating how many failures happened for each machine between 8 days and 1 day before the failure in the line.

This creates an example of the initial dataframe:

import pandas as pd
df1=pd.DataFrame({"Machine":["M0","M2","M3","M0","M2","M3"],"Failure":[0,0,1,1,1,1],"Date-time":["2014-02-20 11:00:19.0","2014-02-21 12:29:55.0","2014-02-20 11:00:21.0","2014-02-19 09:10:19.0","2014-02-18 12:19:47.0","2014-02-20 1:33:00.0"]})

This creates an example output dataframe:

df1=pd.DataFrame({"Machine":["M0","M2","M3","M0","M2","M3"],"Number of failures, d-8 to d-1":[1,1,0,0,0,0],"Failure":[0,0,1,1,1,1],"Date-time":["2014-02-20 11:00:19.0","2014-02-21 12:29:55.0","2014-02-20 11:00:21.0","2014-02-19 09:10:19.0","2014-02-18 12:19:47.0","2014-02-20 1:33:00.0"]})

I've found a similar question, answered here.

Pandas temporal cumulative sum by group

It's probably worth keeping both threads since they are phrased very differently. which may help in searching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM