[英]How to delete consecutive occurrences of an event in a R dataframe?
I have a R dataframe containing date info about generic events: id;start_date;end_date. 我有一个R数据框,其中包含有关通用事件的日期信息:id; start_date; end_date。
Sometimes the same event may occur the same day (1) or at a distance of one day (2), for example: 有时, 同一事件可能在同一天(1)或相距一天(2)发生,例如:
(1) 1001;2016-05-07;2016-05-11 1001;2016-05-11;2016-05-14 (1)1001; 2016-05-07; 2016-05-11 1001; 2016-05-11; 2016-05-14
(2) 1001;2016-05-07;2016-05-11 1001;2016-05-12;2016-05-14 (2)1001; 2016-05-07; 2016-05-11 1001; 2016-05-12; 2016-05-14
In the first case the event "1001" ends and restarts the same day, while in the second case that event ends on 2017-05-11 and starts again the day after. 在第一种情况下,事件“ 1001”结束并在同一天重新开始,而在第二种情况下,事件在2017-05-11结束并在第二天再次开始。 I'd like to delete the second occurrence of the event in both cases. 在这两种情况下,我都想删除该事件的第二次出现。 If the second occurrence is at a distance of two or more days, it's ok to preserve the second occurrence. 如果第二次出现距离为两天或更长时间,则可以保留第二次出现。 How can I do this in R? 我如何在R中做到这一点?
Thank you in advance. 先感谢您。
Partial solution with my guess of how data look like: 我对数据看起来像的部分解决方案:
library(data.table)
dat <- data.table(id = c(1001,1001,1001,1001),
start_date = as.Date(c("2016-05-07", "2016-05-11", "2016-05-07", "2016-05-12")),
end_date = as.Date(c("2016-05-11", "2016-05-14", "2016-05-11", "2016-05-14")))
dat2 <- data.table(id = c(dat$id, NA),
start_date = c(dat$start_date, NA),
end_date = c(as.Date(NA), dat$end_date))
dat2[, dif := end_date - start_date]
Then you can just remove rows with dif <= 0
I guess. 然后,您可以删除dif <= 0
行,我想。
I've used the data.table
package, but you can just do dat2$dif <- dat2$end_date - dat2$start_date
. 我使用了data.table
包,但是您可以执行dat2$dif <- dat2$end_date - dat2$start_date
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.