简体   繁体   中英

Check continuity of time series

I have data stored in a data frame. It contains a column with time instances. Please find the file attached showing an example.

We are trying to check for continuity in the time_split column.

exmaple_data

So what I am essentially trying to do is split the data frame as soon as the continuity is lost. So basically what we are trying to achieve is somehow check if rows of the time column are incrementing my 1 minute if not split the data frame. I tried grouping based on hours but that did not work as the instances were continuity exceeded hours ie lasted longer than an hour and jumped into the next hour.

I would really appreciate some help.

Thank you.

This code generates a list of group ids, based on the time difference between the current and the previous sample.

df.Time_split = pd.to_datetime(df.Time_split) # convert the strings to datetime objects

default = datetime.timedelta(minutes=1) # default speration, to check against


group = [0] # initial group number
prev = df.Time_split[0] # inital sample to compare
for i in range(1, len(df.Time_split)): # for second entry and up
    delta = df.Time_split[i] - prev # delta time
    if(default-delta == datetime.timedelta()): # if difference is zero
        group.append(group[-1]) # current sample belongs to the same group as previous sample
    else:
        group.append(group[-1]+1) # create a new group

    prev = df.Time_split[i] # update previous

df['group_number'] = group # add the list to the dataframe


# optional split by group:
frames = [df[df['group_number'] == x] for x in range(group[-1]+1)] 


I had problems with Izaak Comelis' code in Python3. If find this slight modification to be more reliable/readable.

def _time_continuity(input_df, datetime_col='datetime', minutes=10):
    '''
    Assumes that the datetime column has already been sorted
        df.sort_values(by=datetime_col)
    '''
    
    default = timedelta(minutes=minutes)
    
    group = [0] # initial group number
    grp_ctr = 0
    
    dt_iter = iter(input_df[datetime_col])
    prev = next(dt_iter) #skip first row
    
    for i in dt_iter: # for second entry and up
        delta = abs(i - prev) # delta time
        
        if (delta <= default): # if difference is at tolerence
            group.append(grp_ctr) # current sample belongs to the same group as previous sample
            
        else:
            grp_ctr += 1
            group.append(grp_ctr)
    
        prev = i # update previous
    
    input_df['time_group'] = group # add the list to the dataframe
    
    if len(set(group)) > 1:
        print(f'There are {len(set(group))} time groups')
        
    return input_df

Notes: you can control how you want the timedelta value to create groups. in:

if (delta <= default):

You can change it to the value you want delta == default, delta >= default... which will determine if a new group is made. In my use case, I don't care if the time delta is less than 10 minutes. Keep in mind that >< methods will group duplicate timestamps (delta==0). If you want to catch those use ==.

abs(i -prev) ensures that the sorting of the datetime series does not interfere with the result asc/desc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM