I have data stored in a data frame. It contains a column with time instances. Please find the file attached showing an example.
We are trying to check for continuity in the time_split column.
So what I am essentially trying to do is split the data frame as soon as the continuity is lost. So basically what we are trying to achieve is somehow check if rows of the time column are incrementing my 1 minute if not split the data frame. I tried grouping based on hours but that did not work as the instances were continuity exceeded hours ie lasted longer than an hour and jumped into the next hour.
I would really appreciate some help.
Thank you.
This code generates a list of group ids, based on the time difference between the current and the previous sample.
df.Time_split = pd.to_datetime(df.Time_split) # convert the strings to datetime objects
default = datetime.timedelta(minutes=1) # default speration, to check against
group = [0] # initial group number
prev = df.Time_split[0] # inital sample to compare
for i in range(1, len(df.Time_split)): # for second entry and up
delta = df.Time_split[i] - prev # delta time
if(default-delta == datetime.timedelta()): # if difference is zero
group.append(group[-1]) # current sample belongs to the same group as previous sample
else:
group.append(group[-1]+1) # create a new group
prev = df.Time_split[i] # update previous
df['group_number'] = group # add the list to the dataframe
# optional split by group:
frames = [df[df['group_number'] == x] for x in range(group[-1]+1)]
I had problems with Izaak Comelis' code in Python3. If find this slight modification to be more reliable/readable.
def _time_continuity(input_df, datetime_col='datetime', minutes=10):
'''
Assumes that the datetime column has already been sorted
df.sort_values(by=datetime_col)
'''
default = timedelta(minutes=minutes)
group = [0] # initial group number
grp_ctr = 0
dt_iter = iter(input_df[datetime_col])
prev = next(dt_iter) #skip first row
for i in dt_iter: # for second entry and up
delta = abs(i - prev) # delta time
if (delta <= default): # if difference is at tolerence
group.append(grp_ctr) # current sample belongs to the same group as previous sample
else:
grp_ctr += 1
group.append(grp_ctr)
prev = i # update previous
input_df['time_group'] = group # add the list to the dataframe
if len(set(group)) > 1:
print(f'There are {len(set(group))} time groups')
return input_df
Notes: you can control how you want the timedelta value to create groups. in:
if (delta <= default):
You can change it to the value you want delta == default, delta >= default... which will determine if a new group is made. In my use case, I don't care if the time delta is less than 10 minutes. Keep in mind that >< methods will group duplicate timestamps (delta==0). If you want to catch those use ==.
abs(i -prev) ensures that the sorting of the datetime series does not interfere with the result asc/desc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.