繁体   English   中英

Dplyr - 根据 R 中其他列中的最低值选择列中的值

[英]Dplyr - choosing value in column based on lowest value in other column in R

我目前正在处理一个数据集,每个患者 ID 都有多个活检。 我需要找到最接近特定日期的活检结果(每个患者个人)。 下面可以看到一个虚拟数据集


df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"))

然后我计算了 patodate 和 baselinedate 之间的时间差

df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)

df <- df%>%
  group_by(m1) %>%
  mutate(diff = baselinedate-recdate)

我现在的问题是 - 我想添加一个名为“状态”的新列,它显示(按组 m1)时间差最接近 0 的“活检”结果。最终结果将是

df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"),
                 status=c("3","3","3","2","2","2"))

我希望有人理解这个问题并能够提供帮助。 非常感谢

亲切的问候,

托拜厄斯伯格

您可以获得每组日期之间差异的最小绝对值索引。

library(dplyr)

df %>%
  group_by(m1) %>%
  mutate(status = which.min(abs(patodate - baselinedate))) %>%
  ungroup

#  m1    patodate   baselinedate biopsy status
#  <chr> <date>     <date>       <chr>   <int>
#1 1     2013-06-03 2018-11-09   1           3
#2 1     2014-01-06 2018-11-09   2           3
#3 1     2018-11-23 2018-11-09   3           3
#4 2     2004-03-03 2018-07-24   1           2
#5 2     2018-06-25 2018-07-24   2           2
#6 2     2018-12-19 2018-07-24   3           2

这是另一种方法:

library(dplyr)
library(lubridate)
df %>% 
  group_by(m1) %>% 
  mutate(across(contains("date"), ymd),
         helper = abs(difftime(baselinedate,patodate))) %>% 
  mutate(status = biopsy[helper==min(helper)]) %>% 
  select(-helper)
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>  <chr> 
1 1     2013-06-03 2018-11-09   1      3     
2 1     2014-01-06 2018-11-09   2      3     
3 1     2018-11-23 2018-11-09   3      3     
4 2     2004-03-03 2018-07-24   1      2     
5 2     2018-06-25 2018-07-24   2      2     
6 2     2018-12-19 2018-07-24   3      2  

我们可能会做

library(dplyr)
df %>%
     group_by(m1) %>%
     mutate(status =  abs(patodate - baselinedate),
          status = which(status == min(status))[1]) %>% 
     ungroup

-输出

# A tibble: 6 × 5
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>   <int>
1 1     2013-06-03 2018-11-09   1           3
2 1     2014-01-06 2018-11-09   2           3
3 1     2018-11-23 2018-11-09   3           3
4 2     2004-03-03 2018-07-24   1           2
5 2     2018-06-25 2018-07-24   2           2
6 2     2018-12-19 2018-07-24   3           2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM