简体   繁体   English

使用dplyr根据R中不同事件之间的条件和日期差异删除行

[英]Remove rows based on condition and date difference between different events in R with dplyr

I expect to remove rows based on several condition and dates differences between different events, per ids.我希望根据不同事件之间的几个条件和日期差异删除行,每个 ID。

my data look like this我的数据看起来像这样

#dat
id  name                    date
10  "BREEDING"              2019-05-17
10  "OTHER"                 2020-01-01
11  "BREEDING"              2020-07-01
11  "GESTATION POSITIF"     2020-09-01
12  "BREEDING"              2020-06-26
12  "GESTATION NEGATIF"     2020-08-01
21  "OTHER"                 2018-06-20
21  "GESTATION POSITIF"     2018-10-15
22  "BREEDING"              2020-08-07
22  "GESTATION POSITIF"     2020-09-11

what do I wish is, per ids, for those who got name "BREEDING" and "GESTATION" (doesn't matter if it is positive or negative), to compute the date difference between the events "GESTATION" and the events "BREEDING".我希望根据 id,对于名称为“BREEDING”和“GESTATION”的人(无论是正面还是负面),计算事件“GESTATION”和事件“BREEDING”之间的日期差异”。

Then if the date difference is 34, 35, 36, to remove the row BREEDING who match this condition, within the object dat.然后如果日期差为34、35、36,则删除对象dat中符合此条件的行BREEDING。

I got a loop solution which is doing the job, but I'd like to improve this.我有一个循环解决方案可以完成这项工作,但我想改进这一点。 I made several trials but did not succeed.我做了几次尝试,但没有成功。 Basically I assume that the right code should be something "like" this基本上我认为正确的代码应该是“像”这样的

dat %>% 
group_by(id) %>% 
grepl("GESTATION",name) %>% #keep ids with an event gestation recorded
grepl("BREEDING",name) %>% #keep ids with an event breeding recorded
mutate(new_var = "date GESTATION"-"date BREEDING") %>%#per ids, compute de difference of date beetween events gestation event and breeding
filter(!(new_var %in% c(34:36))) %>% #if an event of breeding happened 35 days before the gestation event, +/- 1 day, remove 
ungroup()

finally my result should be, for this example,最后我的结果应该是,对于这个例子,

#dat
id  name                    date
10  "BREEDING"              2019-05-17
10  "OTHER"                 2020-01-01
11  "BREEDING"              2020-07-01
11  "GESTATION POSITIF"     2020-09-01
12  "GESTATION NEGATIF"     2020-08-01
21  "OTHER"                 2018-06-20
21  "GESTATION POSITIF"     2018-10-15
22  "GESTATION POSITIF"     2020-09-11

This is also probably not the most clean solution but pivoting to wide format and then back to long works:这也可能不是最干净的解决方案,而是转向宽幅,然后再回到长篇幅:

library(tidyverse)
library(lubridate)

dat %>%
  separate(name, into = c("name", "gest"), fill = "right") %>%
  pivot_wider(names_from = name, values_from = c(date, gest)) %>%
  mutate(date_BREEDING = if_else((date_GESTATION - date_BREEDING) %in% c(34, 35, 36), NA_Date_, date_BREEDING)) %>%
  pivot_longer(cols = c(date_BREEDING, date_OTHER, date_GESTATION), values_to = "date", values_drop_na = T) %>%
  select(-gest_BREEDING, -gest_OTHER) %>%
  mutate(name = str_sub(name, 6)) 

The output is:输出是:

     id gest_GESTATION name      date      
  <dbl> <chr>          <chr>     <date>    
1    10 NA             BREEDING  2019-05-17
2    10 NA             OTHER     2020-01-01
3    11 POSITIF        BREEDING  2020-07-01
4    11 POSITIF        GESTATION 2020-09-01
5    12 NEGATIF        GESTATION 2020-08-01
6    21 POSITIF        OTHER     2018-06-20
7    21 POSITIF        GESTATION 2018-10-15
8    22 POSITIF        GESTATION 2020-09-11

Which has the additional advantage of saving whether "GESTATION" is positive or negative in a separate variable.这还有一个额外的优势,即在单独的变量中保存“GESTATION”是正还是负。 If you do not need that and want exactly the desired output specified in your question you can add:如果您不需要它并且想要您的问题中指定的所需输出,您可以添加:

%>%
  mutate(name = if_else(is.na(gest_GESTATION), name, str_c(name, gest_GESTATION, sep = " "))) %>%
  select(-gest_GESTATION)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM