简体   繁体   English

根据 R 中的其他行选择满足条件的行

[英]Selecting rows that meet a condition depending on other rows in R

I am working in R to identify incident cases of a disease.我在 R 工作,以识别疾病的事件病例。 Each patient has multiple visits over the years (each row of the dataframe is one visit), and to be labeled "incident", a visit has to meet the following criteria:每位患者多年来有多次就诊(dataframe 的每一行是一次就诊),并且要被标记为“事件”,就诊必须满足以下标准:

  • The infection test must be positive (Infection == "yes")感染测试必须为阳性(感染 == “是”)
  • That patient has not already been "positive" for an infection in the last two years该患者在过去两年中尚未对感染呈“阳性”

My data looks like this:我的数据如下所示:

在此处输入图像描述

I want to create a new variable indicating whether each visit is an incident infection case or not.我想创建一个新变量,指示每次访问是否是事件感染病例。 For example, the output should look like this:例如,output 应如下所示:

在此处输入图像描述

As seen, a patient can be incident more than once.如所见,患者可能不止一次发生事故。 Any time they have a positive infection test and also haven't had another positive infection test in the past two years, they are considered incident.任何时候他们的感染测试呈阳性并且在过去两年中也没有再次进行阳性感染测试,他们被认为是事件。

I can't find an efficient way get this output in R.我找不到在 R 中获取此 output 的有效方法。 Can it be done using dplyr?可以使用 dplyr 完成吗? Would appreciate any help on this.将不胜感激任何帮助。

One method is to compute the difference in time between infection events ( event_diff ).一种方法是计算感染事件之间的时间差( event_diff )。 Then, incident would be when this difference is greater than 2 years, or difference of 0 (assuming multiple tests are not done on same date).然后, incident将发生在此差异大于 2 年或差异为 0 时(假设多个测试未在同一日期进行)。 Looking at this now, I suspect there are better alternative solutions to this.现在看这个,我怀疑有更好的替代解决方案。

df <- data.frame(
  patient_id = c(1,1,1,1,1,1,2,2,2,2),
  infection = c("no", "yes", "yes", "no", "yes", "yes", "yes", "no", "no", "yes"),
  date = c("2005-02-22", "2005-04-26", "2005-05-06", "2006-05-22", "2007-08-19", "2007-12-15", "2005-10-24", "2005-11-11", "2006-07-12", "2007-12-01")
)

df$date <- as.Date(df$date, "%Y-%m-%d")

library(dplyr)

df %>%
  group_by(patient_id, infection) %>%
  mutate(event_diff = coalesce(date - lag(date), 0)) %>%
  mutate(incident = ifelse(infection == "yes" & (event_diff == 0 | event_diff > (365*2)), "yes", "no"))

   patient_id infection date       event_diff incident
        <dbl> <fct>     <date>     <drtn>     <chr>   
 1          1 no        2005-02-22   0 days   no      
 2          1 yes       2005-04-26   0 days   yes     
 3          1 yes       2005-05-06  10 days   no      
 4          1 no        2006-05-22 454 days   no      
 5          1 yes       2007-08-19 835 days   yes     
 6          1 yes       2007-12-15 118 days   no      
 7          2 yes       2005-10-24   0 days   yes     
 8          2 no        2005-11-11   0 days   no      
 9          2 no        2006-07-12 243 days   no      
10          2 yes       2007-12-01 768 days   yes     

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM