[英]How to calculate elapsed times for the total duration of events?
I have collected a dataframe that models the duration of time for events in a group problem solving session in which the members Communicate ( Discourse Code
) and construct models ( Modeling Code
). 我收集了一个数据框,该数据Discourse Code
在小组成员解决问题的会话中为事件的持续时间建模,在该会话中,成员进行交流( Discourse Code
)并构造模型( Modeling Code
)。 Each minute that that occurs is captured in the Time_Processed
column. 在Time_Processed
列中捕获发生的每一分钟。 Technically these events occur simultaneously. 从技术上讲,这些事件同时发生。 I would like to know how long the students are constructing each type of model which is the total duration of that model or the time elapsed before that model changes. 我想知道学生们构建每种类型的模型要花费多长时间,这是该模型的总持续时间或该模型更改之前经过的时间。
I have the following dataset: 我有以下数据集:
Looks like this: 看起来像这样:
`Modeling Code` `Discourse Code` Time_Processed
<fct> <fct> <dbl>
1 OFF OFF 10.0
2 MA Q 11.0
3 MA AG 16.0
4 V S 18.0
5 V Q 20.0
6 MA C 21.0
7 MA C 23.0
8 MA C 25.0
9 V J 26.0
10 P S 28.0
# My explicit dataframe.
df <- structure(list(`Modeling Code` = structure(c(3L, 2L, 2L, 6L,
6L, 2L, 2L, 2L, 6L, 4L), .Label = c("A", "MA", "OFF", "P", "SM",
"V"), class = "factor"), `Discourse Code` = structure(c(7L, 8L,
1L, 9L, 8L, 2L, 2L, 2L, 6L, 9L), .Label = c("AG", "C", "D", "DA",
"G", "J", "OFF", "Q", "S"), class = "factor"), Time_Processed = c(10,
11, 16, 18, 20, 21, 23, 25, 26, 28)), row.names = c(NA, -10L), .Names = c("Modeling Code",
"Discourse Code", "Time_Processed"), class = c("tbl_df", "tbl",
"data.frame"))
For this dataframe I can find how often the students were constructing each type of model logically like this. 对于此数据框,我可以找到学生按逻辑如此构建每种类型的模型的频率。
With Respect to the Modeling Code
and Time_Processed
columns, 关于“ Modeling Code
和“ Time_Processed
列,
At 10 minutes they are using the OFF model method, then at 11 minutes, they change the model so the duration of the OFF model is (11 - 10) minutes = 1 minute. 在10分钟时,他们使用OFF模型方法,然后在11分钟时,他们更改模型,因此OFF模型的持续时间为(11-10)分钟= 1分钟。 There are no other occurrences of the "OFF" method so the duration of OFF = 1 min . “ OFF”方法没有其他出现,因此OFF的持续时间= 1分钟 。
Likewise, for Modeling Code method "MA", the model is used from 11 minutes to 16 minutes (duration = 5 minutes) and then from 16 minutes to 18 minutes before the model changes to V with (duration = 2 minutes), then the model is used again at 21 minutes and ends at 26 minutes (duration = 5 minutes). 同样,对于建模代码方法“ MA”,使用模型的时间为11分钟到16分钟(持续时间= 5分钟),然后使用16分钟到18分钟,然后模型更改为V(持续时间= 2分钟),然后使用模型将在21分钟后再次使用,并在26分钟时结束(持续时间= 5分钟)。 So the total duration of "MA" is (5 + 2 + 5) minutes = 12 minutes . 因此,“ MA”的总持续时间为(5 + 2 + 5)分钟= 12分钟 。
Likewise the duration of Modeling Code method "V" starts at 18 minutes, ends at 21 minutes (duration = 3 minutes), resumes at 26 minutes, ends at 28 minutes (duration = 2) minutes. 同样,建模代码方法“ V”的持续时间从18分钟开始,以21分钟(持续时间= 3分钟)结束,以26分钟恢复,以28分钟(持续时间= 2)分钟结束。 So total duration of "V" is 3 + 2 = 5 minutes . 因此,“ V”的总持续时间为3 + 2 = 5分钟 。
Then the duration of Modeling Code P, starts at 28 minutes and there is no continuity so total duration of P is 0 minutes . 然后,建模代码P的持续时间从28分钟开始,并且没有连续性,因此P的总持续时间为0分钟 。
So the total duration (minutes) table of the Modeling Codes is this: 因此,建模代码的总持续时间(分钟)表是这样的:
Modeling Code Total_Duration
OFF 1
MA 12
V 5
P 0
This models a barchart that looks like this: 这将为如下所示的条形图建模:
How can the total duration of these modeling methods be constructed? 如何构建这些建模方法的总持续时间?
It would also be nice to know the duration of the combinations such that the only visible combination in this small subset happens to be Modeling Code "MA" paired with Discourse Code "C" and this occurs for 26 - 21 = 5 minutes. 知道组合的持续时间也很高兴,这样在这个小子集中唯一可见的组合恰好是建模代码“ MA”与话语代码“ C”配对,并且持续26-21 = 5分钟。
Thank you. 谢谢。
UPDATED SOLUTION 更新的解决方案
df %>%
mutate(dur = lead(Time_Processed) - Time_Processed) %>%
replace_na(list(dur = 0)) %>%
group_by(`Modeling Code`) %>%
summarise(tot_time = sum(dur))
(^ Thanks to Nick DiQuattro ) (^感谢Nick DiQuattro )
PREVIOUS SOLUTION 上一个解决方案
Here's one solution that creates a new variable, mcode_grp
, which keeps track of discrete groupings of the same Modeling Code
. 这是一个创建新变量mcode_grp
解决方案,该变量可跟踪同一Modeling Code
的离散分组。 It's not particularly pretty - it requires looping over each row in df
- but it works. 它不是特别漂亮-它需要遍历df
每一行-但它可以工作。
First, rename columns for ease of reference: 首先,重命名列以方便参考:
df <- df %>%
rename(m_code = `Modeling Code`,
d_code = `Discourse Code`)
We'll update df
with a few extra variables. 我们将使用一些额外的变量来更新df
。
- lead_time_proc
gives us the Time_Processed
value for the next row in df
, which we'll need when computing the total amount of time for each m_code
batch lead_time_proc
为我们提供df
下一行的Time_Processed
值,这在计算每个m_code
批处理的总时间时需要
- row_n
for keeping track of row number in our iteration row_n
用于跟踪迭代中的行号
- mcode_grp
is the unique label for each m_code
batch mcode_grp
是每个m_code
批次的唯一标签
df <- df %>%
mutate(lead_time_proc = lead(Time_Processed),
row_n = row_number(),
mcode_grp = "")
Next, we need a way to keep track of when we've hit a new batch of a given m_code
value. 接下来,我们需要一种方法来跟踪何时击中了给定m_code
值的新批次。 One way is to keep a counter for each m_code
, and increment it whenever a new batch is reached. 一种方法是为每个m_code
保留一个计数器,并在到达新批次时对其进行递增。 Then we can label all the rows for that m_code
batch as belonging to the same time window. 然后,我们可以将该m_code
批处理的所有行标记为属于同一时间窗口。
mcode_ct <- df %>%
group_by(m_code) %>%
summarise(ct = 0) %>%
mutate(m_code = as.character(m_code))
This is the ugliest part. 这是最丑的部分。 We loop over every row in df
, and check to see if we've reached a new m_code
. 我们遍历df
每一行,并检查是否到达新的m_code
。 If so, we update accordingly, and register a value for mcode_grp
for each row. 如果是这样,我们将进行相应的更新,并为每行注册一个mcode_grp
值。
mc <- ""
for (i in 1:nrow(df)) {
current_mc <- df$m_code[i]
if (current_mc != mc) {
mc <- current_mc
mcode_ct <- mcode_ct %>% mutate(ct = ifelse(m_code == mc, ct + 1, ct))
current_grp <- mcode_ct %>% filter(m_code == mc) %>% select(ct) %>% pull()
}
df <- df %>% mutate(mcode_grp = ifelse(row_n == i, current_grp, mcode_grp))
}
Finally, group_by
m_code
and mcode_grp
, compute the duration for each batch, and then sum over m_code
values. 最后, group_by
m_code
和mcode_grp
,计算每个批次的持续时间,然后对m_code
值求和。
df %>%
group_by(m_code, mcode_grp) %>%
summarise(start_time = min(Time_Processed),
end_time = max(lead_time_proc)) %>%
mutate(total_time = end_time - start_time) %>%
group_by(m_code) %>%
summarise(total_time = sum(total_time)) %>%
replace_na(list(total_time=0))
Output: 输出:
# A tibble: 4 x 2
m_code total_time
<fct> <dbl>
1 MA 12.
2 OFF 1.
3 P 0.
4 V 5.
For any dplyr
/ tidyverse
experts out there, I'd love tips on how to accomplish more of this without resorting to loops and counters! 对于在那里的任何dplyr
/ tidyverse
专家,我都希望获得一些技巧,以帮助您在不使用循环和计数器的情况下完成更多操作!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.