简体   繁体   English

使用R查找两个数据帧之间的时间差

[英]Finding difference in time between two data frames using R

I have two data frame ,one is the in time of employees and the other is the out time of employees.The data in both the data frames have timestamps for about 4000 employees in the last one year(excludes weekend/public holiday dates).Each data frame has 4000 rows and 250 columns.I would like to find the number of hours spent by an employee each day at work basically my approach would be to find the difference in time between the two data frames using difftime() function.i used the below code and expected a resulting data frame containing 4000 rows and 250 columns with difference in time,however the data was returned in one single column.How should I deal with this problem so that I can get the difference in time between two data frames in the data frame format with 4000 rows and 250 columns? 我有两个数据框,一个是员工的入职时间,另一个是员工的外出时间。两个数据框中的数据都包含最近一年中约4000名员工的时间戳(不包括周末/公共假期日期)。每个数据帧有4000行和250列。我想找到一个员工每天在工作上花费的小时数,基本上我的方法是使用difftime()函数查找两个数据帧之间的时间差。使用下面的代码,并期望得到的结果数据帧包含4000行和250列,但它们之间存在时间差异,但是将数据返回到一列中。我应该如何处理此问题,以便获得两个数据之间的时间差具有4000行和250列的数据帧格式的帧?

hours_spent <- as.data.frame(as.matrix(difftime(as.matrix(out_time_data_hrs),as.matrix(in_time_data_hrs),unit='hour')))

Input data looks like below , 输入数据如下所示,

In_time data frame 准时数据帧

在此处输入图片说明

Out_time data frame Out_time数据帧

在此处输入图片说明

Expected output 预期产量

在此处输入图片说明

Here's a small and simple example based on the data you posted and a possible solution: 这是一个基于您发布的数据和可能的解决方案的小而简单的示例:

# example data in_times
df1 = data.frame(`2018-08-01` = c("2018-08-01 10:30:00", "2018-08-01 10:25:00"),
                 `2018-08-02` = c("2018-08-02 10:20:00", "2018-08-02 10:45:00"))
# example data out_times
df2 = data.frame(`2018-08-01` = c("2018-08-01 17:33:00", "2018-08-01 18:06:00"),
                 `2018-08-02` = c("2018-08-02 17:11:00", "2018-08-02 17:45:00"))

library(tidyverse)

# reshape datasets
df1_resh = df1 %>%
  mutate(empl_id = row_number()) %>%   # add an employee id (using the row number)
  gather(day, in_time, -empl_id)       # reshape dataset

df2_resh = df2 %>%
  mutate(empl_id = row_number()) %>%
  gather(day, out_time, -empl_id)

# join datasets and calculate hours spent
left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
  mutate(hours_spent = difftime(out_time, in_time))

#   empl_id         day             in_time            out_time    hours_spent
# 1       1 X2018.08.01 2018-08-01 10:30:00 2018-08-01 17:33:00 7.050000 hours
# 2       2 X2018.08.01 2018-08-01 10:25:00 2018-08-01 18:06:00 7.683333 hours
# 3       1 X2018.08.02 2018-08-02 10:20:00 2018-08-02 17:11:00 6.850000 hours
# 4       2 X2018.08.02 2018-08-02 10:45:00 2018-08-02 17:45:00 7.000000 hours

You can use this as the final piece of code if you want to reshape back to your initial format: 如果要重新调整为初始格式,可以将其用作最后的代码:

left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
  mutate(hours_spent = difftime(out_time, in_time)) %>%
  select(empl_id, day, hours_spent) %>%
  spread(day, hours_spent)

#   empl_id    X2018.08.01 X2018.08.02
# 1       1 7.050000 hours  6.85 hours
# 2       2 7.683333 hours  7.00 hours

我的要求可以满足,只需做下面的事情就可以了

employee_hrs_df <- out_time_data - in_time_data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM