简体   繁体   English

如何计算事件总持续时间的经过时间?

[英]How to calculate elapsed times for the total duration of events?

I have collected a dataframe that models the duration of time for events in a group problem solving session in which the members Communicate ( Discourse Code ) and construct models ( Modeling Code ). 我收集了一个数据框,该数据Discourse Code在小组成员解决问题的会话中为事件的持续时间建模,在该会话中,成员进行交流( Discourse Code )并构造模型( Modeling Code )。 Each minute that that occurs is captured in the Time_Processed column. Time_Processed列中捕获发生的每一分钟。 Technically these events occur simultaneously. 从技术上讲,这些事件同时发生。 I would like to know how long the students are constructing each type of model which is the total duration of that model or the time elapsed before that model changes. 我想知道学生们构建每种类型的模型要花费多长时间,这是该模型的总持续时间或该模型更改之前经过的时间。

I have the following dataset: 我有以下数据集:

Looks like this: 看起来像这样:

 `Modeling Code` `Discourse Code` Time_Processed
   <fct>           <fct>                     <dbl>
 1 OFF             OFF                        10.0
 2 MA              Q                          11.0
 3 MA              AG                         16.0
 4 V               S                          18.0
 5 V               Q                          20.0
 6 MA              C                          21.0
 7 MA              C                          23.0
 8 MA              C                          25.0
 9 V               J                          26.0
10 P               S                          28.0

# My explicit dataframe. 
df <- structure(list(`Modeling Code` = structure(c(3L, 2L, 2L, 6L, 
6L, 2L, 2L, 2L, 6L, 4L), .Label = c("A", "MA", "OFF", "P", "SM", 
"V"), class = "factor"), `Discourse Code` = structure(c(7L, 8L, 
1L, 9L, 8L, 2L, 2L, 2L, 6L, 9L), .Label = c("AG", "C", "D", "DA", 
"G", "J", "OFF", "Q", "S"), class = "factor"), Time_Processed = c(10, 
11, 16, 18, 20, 21, 23, 25, 26, 28)), row.names = c(NA, -10L), .Names = c("Modeling Code", 
"Discourse Code", "Time_Processed"), class = c("tbl_df", "tbl", 
"data.frame"))

For this dataframe I can find how often the students were constructing each type of model logically like this. 对于此数据框,我可以找到学生按逻辑如此构建每种类型的模型的频率。

With Respect to the Modeling Code and Time_Processed columns, 关于“ Modeling Code和“ Time_Processed列,

At 10 minutes they are using the OFF model method, then at 11 minutes, they change the model so the duration of the OFF model is (11 - 10) minutes = 1 minute. 在10分钟时,他们使用OFF模型方法,然后在11分钟时,他们更改模型,因此OFF模型的持续时间为(11-10)分钟= 1分钟。 There are no other occurrences of the "OFF" method so the duration of OFF = 1 min . “ OFF”方法没有其他出现,因此OFF的持续时间= 1分钟

Likewise, for Modeling Code method "MA", the model is used from 11 minutes to 16 minutes (duration = 5 minutes) and then from 16 minutes to 18 minutes before the model changes to V with (duration = 2 minutes), then the model is used again at 21 minutes and ends at 26 minutes (duration = 5 minutes). 同样,对于建模代码方法“ MA”,使用模型的时间为11分钟到16分钟(持续时间= 5分钟),然后使用16分钟到18分钟,然后模型更改为V(持续时间= 2分钟),然后使用模型将在21分钟后再次使用,并在26分钟时结束(持续时间= 5分钟)。 So the total duration of "MA" is (5 + 2 + 5) minutes = 12 minutes . 因此,“ MA”总持续时间为(5 + 2 + 5)分钟= 12分钟

Likewise the duration of Modeling Code method "V" starts at 18 minutes, ends at 21 minutes (duration = 3 minutes), resumes at 26 minutes, ends at 28 minutes (duration = 2) minutes. 同样,建模代码方法“ V”的持续时间从18分钟开始,以21分钟(持续时间= 3分钟)结束,以26分钟恢复,以28分钟(持续时间= 2)分钟结束。 So total duration of "V" is 3 + 2 = 5 minutes . 因此,“ V”的总持续时间为3 + 2 = 5分钟

Then the duration of Modeling Code P, starts at 28 minutes and there is no continuity so total duration of P is 0 minutes . 然后,建模代码P的持续时间从28分钟开始,并且没有连续性,因此P的总持续时间为0分钟

So the total duration (minutes) table of the Modeling Codes is this: 因此,建模代码的总持续时间(分钟)表是这样的:

Modeling Code     Total_Duration
    OFF               1
    MA               12
    V                 5 
    P                 0 

This models a barchart that looks like this: 这将为如下所示的条形图建模:

在此处输入图片说明

How can the total duration of these modeling methods be constructed? 如何构建这些建模方法的总持续时间?

It would also be nice to know the duration of the combinations such that the only visible combination in this small subset happens to be Modeling Code "MA" paired with Discourse Code "C" and this occurs for 26 - 21 = 5 minutes. 知道组合的持续时间也很高兴,这样在这个小子集中唯一可见的组合恰好是建模代码“ MA”与话语代码“ C”配对,并且持续26-21 = 5分钟。

Thank you. 谢谢。

UPDATED SOLUTION 更新的解决方案

df %>% 
  mutate(dur = lead(Time_Processed) - Time_Processed) %>% 
  replace_na(list(dur = 0)) %>% 
  group_by(`Modeling Code`) %>% 
  summarise(tot_time = sum(dur))

(^ Thanks to Nick DiQuattro ) (^感谢Nick DiQuattro

PREVIOUS SOLUTION 上一个解决方案
Here's one solution that creates a new variable, mcode_grp , which keeps track of discrete groupings of the same Modeling Code . 这是一个创建新变量mcode_grp解决方案,该变量可跟踪同一Modeling Code的离散分组。 It's not particularly pretty - it requires looping over each row in df - but it works. 它不是特别漂亮-它需要遍历df每一行-但它可以工作。

First, rename columns for ease of reference: 首先,重命名列以方便参考:

df <- df %>%
  rename(m_code = `Modeling Code`,
         d_code = `Discourse Code`)

We'll update df with a few extra variables. 我们将使用一些额外的变量来更新df
- lead_time_proc gives us the Time_Processed value for the next row in df , which we'll need when computing the total amount of time for each m_code batch lead_time_proc为我们提供df下一行的Time_Processed值,这在计算每个m_code批处理的总时间时需要
- row_n for keeping track of row number in our iteration row_n用于跟踪迭代中的行号
- mcode_grp is the unique label for each m_code batch mcode_grp是每个m_code批次的唯一标签

df <- df %>%
  mutate(lead_time_proc = lead(Time_Processed),
         row_n = row_number(),
         mcode_grp = "") 

Next, we need a way to keep track of when we've hit a new batch of a given m_code value. 接下来,我们需要一种方法来跟踪何时击中了给定m_code值的新批次。 One way is to keep a counter for each m_code , and increment it whenever a new batch is reached. 一种方法是为每个m_code保留一个计数器,并在到达新批次时对其进行递增。 Then we can label all the rows for that m_code batch as belonging to the same time window. 然后,我们可以将该m_code批处理的所有行标记为属于同一时间窗口。

mcode_ct <- df %>% 
  group_by(m_code) %>% 
  summarise(ct = 0) %>%
  mutate(m_code = as.character(m_code))

This is the ugliest part. 这是最丑的部分。 We loop over every row in df , and check to see if we've reached a new m_code . 我们遍历df每一行,并检查是否到达新的m_code If so, we update accordingly, and register a value for mcode_grp for each row. 如果是这样,我们将进行相应的更新,并为每行注册一个mcode_grp值。

mc <- ""
for (i in 1:nrow(df)) {
  current_mc <- df$m_code[i]
  if (current_mc != mc) {
    mc <- current_mc
    mcode_ct <- mcode_ct %>% mutate(ct = ifelse(m_code == mc, ct + 1, ct))
    current_grp <- mcode_ct %>% filter(m_code == mc) %>% select(ct) %>% pull()
  }
  df <- df %>% mutate(mcode_grp = ifelse(row_n == i, current_grp, mcode_grp))
}

Finally, group_by m_code and mcode_grp , compute the duration for each batch, and then sum over m_code values. 最后, group_by m_codemcode_grp ,计算每个批次的持续时间,然后对m_code值求和。

 df %>%
   group_by(m_code, mcode_grp) %>%
   summarise(start_time = min(Time_Processed),
             end_time = max(lead_time_proc)) %>%
   mutate(total_time = end_time - start_time) %>%
   group_by(m_code) %>%
   summarise(total_time = sum(total_time)) %>%
   replace_na(list(total_time=0))

Output: 输出:

# A tibble: 4 x 2
  m_code total_time
  <fct>       <dbl>
1 MA            12.
2 OFF            1.
3 P              0.
4 V              5.

For any dplyr / tidyverse experts out there, I'd love tips on how to accomplish more of this without resorting to loops and counters! 对于在那里的任何dplyr / tidyverse专家,我都希望获得一些技巧,以帮助您在不使用循环和计数器的情况下完成更多操作!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM