简体   繁体   中英

Finding difference in time between two data frames using R

I have two data frame ,one is the in time of employees and the other is the out time of employees.The data in both the data frames have timestamps for about 4000 employees in the last one year(excludes weekend/public holiday dates).Each data frame has 4000 rows and 250 columns.I would like to find the number of hours spent by an employee each day at work basically my approach would be to find the difference in time between the two data frames using difftime() function.i used the below code and expected a resulting data frame containing 4000 rows and 250 columns with difference in time,however the data was returned in one single column.How should I deal with this problem so that I can get the difference in time between two data frames in the data frame format with 4000 rows and 250 columns?

hours_spent <- as.data.frame(as.matrix(difftime(as.matrix(out_time_data_hrs),as.matrix(in_time_data_hrs),unit='hour')))

Input data looks like below ,

In_time data frame

在此处输入图片说明

Out_time data frame

在此处输入图片说明

Expected output

在此处输入图片说明

Here's a small and simple example based on the data you posted and a possible solution:

# example data in_times
df1 = data.frame(`2018-08-01` = c("2018-08-01 10:30:00", "2018-08-01 10:25:00"),
                 `2018-08-02` = c("2018-08-02 10:20:00", "2018-08-02 10:45:00"))
# example data out_times
df2 = data.frame(`2018-08-01` = c("2018-08-01 17:33:00", "2018-08-01 18:06:00"),
                 `2018-08-02` = c("2018-08-02 17:11:00", "2018-08-02 17:45:00"))

library(tidyverse)

# reshape datasets
df1_resh = df1 %>%
  mutate(empl_id = row_number()) %>%   # add an employee id (using the row number)
  gather(day, in_time, -empl_id)       # reshape dataset

df2_resh = df2 %>%
  mutate(empl_id = row_number()) %>%
  gather(day, out_time, -empl_id)

# join datasets and calculate hours spent
left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
  mutate(hours_spent = difftime(out_time, in_time))

#   empl_id         day             in_time            out_time    hours_spent
# 1       1 X2018.08.01 2018-08-01 10:30:00 2018-08-01 17:33:00 7.050000 hours
# 2       2 X2018.08.01 2018-08-01 10:25:00 2018-08-01 18:06:00 7.683333 hours
# 3       1 X2018.08.02 2018-08-02 10:20:00 2018-08-02 17:11:00 6.850000 hours
# 4       2 X2018.08.02 2018-08-02 10:45:00 2018-08-02 17:45:00 7.000000 hours

You can use this as the final piece of code if you want to reshape back to your initial format:

left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
  mutate(hours_spent = difftime(out_time, in_time)) %>%
  select(empl_id, day, hours_spent) %>%
  spread(day, hours_spent)

#   empl_id    X2018.08.01 X2018.08.02
# 1       1 7.050000 hours  6.85 hours
# 2       2 7.683333 hours  7.00 hours

我的要求可以满足,只需做下面的事情就可以了

employee_hrs_df <- out_time_data - in_time_data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM