简体   繁体   中英

How do I get difference between two dates but also have a way to sum up hours by day?

I'm a little new to Python and I'm not sure where to start.
I have a python dataframe that contains shift information like the below:

EmployeeID  ShiftType BeginDate            EndDate
10          Holiday   2020-01-01 21:00:00  2020-01-02 07:00:00
10          Regular   2020-01-02 21:00:00  2020-01-03 07:00:00
10          Regular   2020-01-03 21:00:00  2020-01-04 07:00:00
10          Regular   2020-01-04 21:00:00  2020-01-05 07:00:00
20          Regular   2020-02-01 09:00:00  2020-02-01 17:00:00
20          Regular   2020-02-02 09:00:00  2020-02-02 17:00:00
20          Regular   2020-02-03 09:00:00  2020-02-03 17:00:00
20          Regular   2020-02-04 09:00:00  2020-02-04 17:00:00

I'd like to be able to break each shift down and summarize hours worked each day. The desired ouput is below:

EmployeeID  ShiftType Date       HoursWorked
10          Holiday   2020-01-01 3
10          Regular   2020-01-02 10
10          Regular   2020-01-03 10
10          Regular   2020-01-04 10
10          Regular   2020-01-05 7
20          Regular   2020-02-01 10
20          Regular   2020-02-02 10
20          Regular   2020-02-03 10
20          Regular   2020-02-04 10

I know how to get the work hours like below. This does get me hours for each shift, but I would like to be able to break out each calendar day and hours for that day. Retaining 'ShiftType' is not that important here.

df['HoursWorked'] = ((pd.to_datetime(schedule['EndDate']) - pd.to_datetime(schedule['BeginDate'])).dt.total_seconds() / 3600)

Any suggestion would be appreciated.

You could calculate the working hours for each date (BeginDate and EndDate) separately, which would give you three pd.Series, two for the parts of the night shifts that are on different dates and one for the day shift that is on one date. Use the according date as index for those Series.

# make sure dtype is correct
df["BeginDate"] = pd.to_datetime(df["BeginDate"])
df["EndDate"] = pd.to_datetime(df["EndDate"])

# get a mask where there is a night shift; end date = start date +1
m = df["BeginDate"].dt.date != df["EndDate"].dt.date
# extract the night shifts
s0 = pd.Series((df["BeginDate"][m].dt.ceil('d')-df["BeginDate"][m]).values,
               index=df["BeginDate"][m].dt.floor('d'))
s1 = pd.Series((df["EndDate"][m]-df["EndDate"][m].dt.floor('d')).values,
               index=df["EndDate"][m].dt.floor('d'))
# ...and the day shifts
s2 = pd.Series((df["EndDate"][~m]-df["BeginDate"][~m]).values, index=df["BeginDate"][~m].dt.floor('d'))

Now concat and sum them, giving you a pd.Series with the working hours for each date:

working_hours = pd.concat([s0, s1, s2], axis=1).sum(axis=1)

# working_hours
# 2020-01-01   0 days 03:00:00
# 2020-01-02   0 days 10:00:00
# 2020-01-03   0 days 10:00:00
# ...
# Freq: D, dtype: timedelta64[ns]

To join with your original df, you can reindex that and add the working_hours:

new_index = pd.concat([df["BeginDate"].dt.floor('d'),
                       df["EndDate"].dt.floor('d')]).drop_duplicates()
df_out = df.set_index(df["BeginDate"].dt.floor('d')).reindex(new_index, method='nearest')
df_out.index = df_out.index.set_names('date')
df_out = df_out.drop(['BeginDate', 'EndDate'], axis=1).sort_index()
df_out['HoursWorked'] = working_hours.dt.total_seconds()/3600

# df_out
#             EmployeeID ShiftType  HoursWorked
# date                                         
# 2020-01-01          10   Holiday          3.0
# 2020-01-02          10   Regular         10.0
# 2020-01-03          10   Regular         10.0
# 2020-01-04          10   Regular         10.0
# 2020-01-05          10   Regular          7.0
# 2020-02-01          20   Regular          8.0
# 2020-02-02          20   Regular          8.0
# 2020-02-03          20   Regular          8.0
# 2020-02-04          20   Regular          8.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM