[英]In R how to calculate the difference between two columns dates by group (id) but whilst keeping the first available date as reference
How to compute the time between two columns dates but keeping the first or earliest date as reference, by group.如何按组计算两列日期之间的时间,但保留第一个或最早日期作为参考。 For example the
id
N02
, the reference date_1 should remains 2009-07-10
until the next id
.例如
id
N02
,引用 date_1 应该保持2009-07-10
直到下一个id
。 I think that I am close but I can't succeed in finding the right solution.我认为我很接近,但我无法成功找到正确的解决方案。
Please find below a minimal working example:请在下面找到一个最小的工作示例:
id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
df1 <- data.frame (id, date_1, date_2)
> df1
id date_1 date_2
1 N02 2008-03-15 2008-03-15
2 N02 2008-04-15 2008-04-15
3 N03 2008-06-15 2008-06-15
4 N03 2008-07-15 2008-07-15
5 N04 2009-07-10 2009-07-10
6 N04 2009-07-13 2009-07-13
7 N04 2009-07-15 2009-07-15
8 N04 2009-07-16 2009-07-16
9 N04 2009-07-17 2009-07-17
10 N04 2009-07-20 2009-07-20
My failed attempt:我失败的尝试:
df2 <- df1 %>% group_by (id) %>% mutate (diff = difftime (date_2, lag (date_1, default = date_1[1]), unit = "day"))
> df2
# A tibble: 10 × 4
# Groups: id [3]
id date_1 date_2 diff
<chr> <chr> <chr> <drtn>
1 N02 2008-03-15 2008-03-15 0.00000 days
2 N02 2008-04-15 2008-04-15 30.95833 days
3 N03 2008-06-15 2008-06-15 0.00000 days
4 N03 2008-07-15 2008-07-15 30.00000 days
5 N04 2009-07-10 2009-07-10 0.00000 days
6 N04 2009-07-13 2009-07-13 3.00000 days
7 N04 2009-07-15 2009-07-15 2.00000 days
8 N04 2009-07-16 2009-07-16 1.00000 days
9 N04 2009-07-17 2009-07-17 1.00000 days
10 N04 2009-07-20 2009-07-20 3.00000 days
However I would like something like this:但是我想要这样的东西:
id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
diff <- c("0.00000 days", "30.95833 days", "0.00000 days", "30.00000 days", "0.00000 days", "3.00000 days", "5.00000 days", "6.00000 days", "7.00000 days", "10.0000 days")
df2 <- data.frame (id, date_1, date_2, diff)
> df2
id date_1 date_2 diff
1 N02 2008-03-15 2008-03-15 0.00000 days
2 N02 2008-04-15 2008-04-15 30.95833 days
3 N03 2008-06-15 2008-06-15 0.00000 days
4 N03 2008-07-15 2008-07-15 30.00000 days
5 N04 2009-07-10 2009-07-10 0.00000 days
6 N04 2009-07-13 2009-07-13 3.00000 days
7 N04 2009-07-15 2009-07-15 5.00000 days
8 N04 2009-07-16 2009-07-16 6.00000 days
9 N04 2009-07-17 2009-07-17 7.00000 days
10 N04 2009-07-20 2009-07-20 10.0000 days
Thank you in advance for your help.预先感谢您的帮助。 Charles
查尔斯
You were almost there - just use [[1]]
(or dplyr::first()
) instead of lag()
:你几乎就在那里 - 只需使用
[[1]]
(或dplyr::first()
)而不是lag()
:
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(diff = difftime(date_2, date_1[[1]], unit = "day")) %>%
ungroup()
# A tibble: 10 × 4
id date_1 date_2 diff
<chr> <chr> <chr> <drtn>
1 N02 2008-03-15 2008-03-15 0 days
2 N02 2008-04-15 2008-04-15 31 days
3 N03 2008-06-15 2008-06-15 0 days
4 N03 2008-07-15 2008-07-15 30 days
5 N04 2009-07-10 2009-07-10 0 days
6 N04 2009-07-13 2009-07-13 3 days
7 N04 2009-07-15 2009-07-15 5 days
8 N04 2009-07-16 2009-07-16 6 days
9 N04 2009-07-17 2009-07-17 7 days
10 N04 2009-07-20 2009-07-20 10 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.