简体   繁体   English

如何删除R数据帧中事件的连续出现?

[英]How to delete consecutive occurrences of an event in a R dataframe?

I have a R dataframe containing date info about generic events: id;start_date;end_date. 我有一个R数据框,其中包含有关通用事件的日期信息:id; start_date; end_date。

Sometimes the same event may occur the same day (1) or at a distance of one day (2), for example: 有时, 同一事件可能在同一天(1)或相距一天(2)发生,例如:

(1) 1001;2016-05-07;2016-05-11 1001;2016-05-11;2016-05-14 (1)1001; 2016-05-07; 2016-05-11 1001; 2016-05-11; 2016-05-14

(2) 1001;2016-05-07;2016-05-11 1001;2016-05-12;2016-05-14 (2)1001; 2016-05-07; 2016-05-11 1001; 2016-05-12; 2016-05-14

In the first case the event "1001" ends and restarts the same day, while in the second case that event ends on 2017-05-11 and starts again the day after. 在第一种情况下,事件“ 1001”结束并在同一天重新开始,而在第二种情况下,事件在2017-05-11结束并在第二天再次开始。 I'd like to delete the second occurrence of the event in both cases. 在这两种情况下,我都想删除该事件的第二次出现。 If the second occurrence is at a distance of two or more days, it's ok to preserve the second occurrence. 如果第二次出现距离为两天或更长时间,则可以保留第二次出现。 How can I do this in R? 我如何在R中做到这一点?

Thank you in advance. 先感谢您。

Partial solution with my guess of how data look like: 我对数据看起来像的部分解决方案:

library(data.table)
dat <- data.table(id = c(1001,1001,1001,1001),
                  start_date = as.Date(c("2016-05-07", "2016-05-11", "2016-05-07", "2016-05-12")),
                  end_date = as.Date(c("2016-05-11", "2016-05-14", "2016-05-11", "2016-05-14")))

dat2 <- data.table(id = c(dat$id, NA),
                   start_date = c(dat$start_date, NA),
                   end_date = c(as.Date(NA), dat$end_date))

dat2[, dif := end_date - start_date]

Then you can just remove rows with dif <= 0 I guess. 然后,您可以删除dif <= 0行,我想。

I've used the data.table package, but you can just do dat2$dif <- dat2$end_date - dat2$start_date . 我使用了data.table包,但是您可以执行dat2$dif <- dat2$end_date - dat2$start_date

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM