I have a dataset as below where each ID can checkin and chekout at any given time and duration
ID checkin_datetime checkout_datetime
4 04-01-2019 13:07 04-01-2019 13:09
4 04-01-2019 13:09 04-01-2019 13:12
4 04-01-2019 14:06 04-01-2019 14:07
4 04-01-2019 14:55 04-01-2019 15:06
22 04-01-2019 20:23 04-01-2019 21:32
22 04-01-2019 21:38 04-01-2019 21:42
25 04-01-2019 23:22 04-02-2019 00:23
29 04-02-2019 01:00 04-02-2019 06:15
The Checked in minutes computed from this needs to be divided into into hourly buckets as in the following table so that I can compute the cumulative totals by the hour each Id across hours and days even when the checkin check out is taking place across days.
Help appreciated :)
ID checkin_datetime checkout_datetime day HR Minutes
4 04-01-2019 13:07 04-01-2019 13:09 04-01-2019 13 2
4 04-01-2019 13:09 04-01-2019 13:12 04-01-2019 13 3
4 04-01-2019 14:06 04-01-2019 14:07 04-01-2019 14 1
4 04-01-2019 14:55 04-01-2019 15:06 04-01-2019 14 5
4 04-01-2019 14:55 04-01-2019 15:06 04-01-2019 15 6
22 04-01-2019 20:23 04-01-2019 21:32 04-01-2019 20 27
22 04-01-2019 20:23 04-01-2019 21:32 04-01-2019 21 32
22 04-01-2019 21:38 04-01-2019 21:42 04-01-2019 21 4
25 04-01-2019 23:22 04-02-2019 00:23 04-01-2019 23 28
25 04-01-2019 23:22 04-02-2019 00:23 04-02-2019 0 23
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 1 60
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 2 60
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 3 60
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 4 60
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 5 60
29 04-02-2019 01:00 04-02-2019 06:15 04-02-2019 6 15
Code to create the dataframe:
data={'ID':[4,4,4,4,22,22,25,29],
'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
,'04-01-2019 21:42','04-02-2019 00:23'
,'04-02-2019 06:15']
}
df = DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])
df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])
Pretty simple:
- for the duration, you just subtract the checkout from the checkin ( datetime
can do that).
- To get it in minutes - divide it by a timedelta
of one minute (I'll use the pandas
built-in one).
- to get the hour from a datetime
, call .hour
, and similarly .date()
for the date (the first is an attribute, the second is a method - watch the parentheses).
df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)
[Edited]: I have a solution to split the duration into hours, but it's not the most elegant...
df2 = pd.DataFrame(
index=pd.DatetimeIndex(
start=df['checkin_datetime'].min(),
end=df['checkout_datetime'].max(),freq='1T'),
columns = ['is_checked_in','ID'], data=0)
for index, row in df.iterrows():
df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']
df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})
df3['Hour'] = df3.index.to_series().apply(lambda x: x.hour)
import pandas as pd
data={'ID':[4,4,4,4,22,22,25,29],
'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
,'04-01-2019 21:42','04-02-2019 00:23'
,'04-02-2019 06:15']
}
df = pd.DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])
df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])
df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)
with pd.option_context('display.max_rows', None, 'display.max_columns', None): # more options can be specified also
print(df)
i think previous answer given by Itamar Muskhkin is absolutely correct.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.