简体   繁体   English

如何在组之间进行时间差异

[英]How to make timing differences between groups

I have a problem related with timing differences and I am trying to solve via dplyr . 我有一个与时间差异有关的问题,我正在尝试通过dplyr解决。 My initial data frame looks like this : 我的初始数据框如下所示:

Paper <- data.frame(
  Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), 
  Dates = c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
  Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
  Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)

or 要么

   Student      Dates     Time  Connection
       A    2014-04-17  10:35:00    Initial
       A    2014-04-17  11:25:00      Final
       A    2014-04-17  19:15:00    Initial
       A    2014-04-17  21:00:00      Final
       A    2014-04-18  22:00:00    Initial
       A    2014-04-18  22:21:26      Final
       B    2014-04-18  10:25:00    Initial
       B    2014-04-18  11:15:00      Final
       B    2014-04-18  16:05:00    Initial
       B    2014-04-18  17:25:00      Final

I am trying to to know for each Date the time dedicated by Student considering that the real time calculated is between the "Initial" and "Final" Connection . 我想知道每个Date由专门的时间Student考虑计算出的实际时间之间"Initial""Final" Connection

So my expected data frame would look like this : 因此,我期望的数据框架如下所示:

  Student    Dates    Time (Minutes)
     A    14-04-17     155
     A    14-04-18   21.43
     B    14-04-18     130

I have tried this, and I almost got the solution but I don't know how to consider the calculation of the difference of time between connection ( "Initial" / "Final" ) so I obtain this: 我已经尝试过了,但是我几乎得到了解决方案,但是我不知道如何考虑连接之间的时间差( "Initial" / "Final" )的计算,所以我得到了这一点:

Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")

Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
                         format = "%H:%M:%S"))

FinalPaper <- 
  Paper %>% 
  group_by(Student, Dates) %>% 
  summarise(TimeSpent = sum(diff(Time))) %>% 
  mutate(TimeSpent = TimeSpent/60) %>% 
  mutate(TimeSpent = round(TimeSpent, digits = 2))

Resulting 结果

  Student      Dates   TimeSpent
1       A   2014-04-17    625.00
2       A   2014-04-18     21.43
3       B   2014-04-18    420.00

As can be seen in the TimeSpent the time is higher this is because I am not considering the connection so it is calculating wrong times. TimeSpent可以看出,时间TimeSpent ,这是因为我没有考虑连接,所以它正在计算错误的时间。 For example for the student A it is calculating the time between 10:35:00 and 21:00:00 which is wrong. 例如,对于学生A,它正在计算10:35:0021:00:00之间的时间,这是错误的。

Thank you very much!! 非常感谢你!!

You could add an id to each 'session', with cumsum(Connection == "Initial") . 您可以使用cumsum(Connection == "Initial")向每个“会话”添加一个id。 Prerequisite for this is that the data is sorted in the way you have presented it here. 前提是必须按照此处显示的方式对数据进行排序。 We can then calculate the time difference for each session, and aggregate again to get the total time spent per student per date: 然后,我们可以计算每个会话的时差,并再次汇总以得出每个日期每个学生花费的总时间:

Paper <- data.frame(
  Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), 
  Dates= c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
  Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"), 
  Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)

Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")
Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
                                    format = "%H:%M:%S"))

FinalPaper <- Paper %>% 
  mutate(seqid = cumsum(Connection == "Initial")) %>% 
  group_by(Student, Dates, seqid) %>% 
  summarise(TimeSpent = sum(diff(Time))) %>% 
  group_by(Student, Dates) %>% 
  summarise(TimeSpent = round(sum(TimeSpent)/60,2))

Output: 输出:

# A tibble: 3 x 3
# Groups:   Student [2]
  Student      Dates TimeSpent
   <fctr>     <date>     <dbl>
1       A 2014-04-17    155.00
2       A 2014-04-18     21.43
3       B 2014-04-18    130.00

Hope this helps! 希望这可以帮助!

And here is a data.table based solution: 这是一个基于data.table的解决方案:

library(data.table)
setDT(Paper)
Paper[order(Student, Time), .(
    TimeSpend = sum(c(0,diff(Time))[Connection == "Final"])/60
  ), by = .(Student, Dates)]

   Student      Dates TimeSpend
1:       A 2014-04-17 155.00000
2:       A 2014-04-18  21.43333
3:       B 2014-04-18 130.00000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM