简体   繁体   English

Dplyr - 根据 R 中其他列中的最低值选择列中的值

[英]Dplyr - choosing value in column based on lowest value in other column in R

I am currently working on a dataset with multiple biopsies per patient ID.我目前正在处理一个数据集,每个患者 ID 都有多个活检。 I need to find the biopsy result closest to a specific date (individual per patient).我需要找到最接近特定日期的活检结果(每个患者个人)。 A dummy dataset can be seen below下面可以看到一个虚拟数据集


df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"))

I have then calculated the time difference between patodate and baselinedate然后我计算了 patodate 和 baselinedate 之间的时间差

df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)

df <- df%>%
  group_by(m1) %>%
  mutate(diff = baselinedate-recdate)

My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be我现在的问题是 - 我想添加一个名为“状态”的新列,它显示(按组 m1)时间差最接近 0 的“活检”结果。最终结果将是

df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"),
                 status=c("3","3","3","2","2","2"))

I hope someone understands the issue and is able to help.我希望有人理解这个问题并能够提供帮助。 Many thanks非常感谢

Kind regards,亲切的问候,

Tobias Berg托拜厄斯伯格

You can get index of minimum absolute value of difference between the dates for each group.您可以获得每组日期之间差异的最小绝对值索引。

library(dplyr)

df %>%
  group_by(m1) %>%
  mutate(status = which.min(abs(patodate - baselinedate))) %>%
  ungroup

#  m1    patodate   baselinedate biopsy status
#  <chr> <date>     <date>       <chr>   <int>
#1 1     2013-06-03 2018-11-09   1           3
#2 1     2014-01-06 2018-11-09   2           3
#3 1     2018-11-23 2018-11-09   3           3
#4 2     2004-03-03 2018-07-24   1           2
#5 2     2018-06-25 2018-07-24   2           2
#6 2     2018-12-19 2018-07-24   3           2

Here is an alternative way:这是另一种方法:

library(dplyr)
library(lubridate)
df %>% 
  group_by(m1) %>% 
  mutate(across(contains("date"), ymd),
         helper = abs(difftime(baselinedate,patodate))) %>% 
  mutate(status = biopsy[helper==min(helper)]) %>% 
  select(-helper)
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>  <chr> 
1 1     2013-06-03 2018-11-09   1      3     
2 1     2014-01-06 2018-11-09   2      3     
3 1     2018-11-23 2018-11-09   3      3     
4 2     2004-03-03 2018-07-24   1      2     
5 2     2018-06-25 2018-07-24   2      2     
6 2     2018-12-19 2018-07-24   3      2  

We may do我们可能会做

library(dplyr)
df %>%
     group_by(m1) %>%
     mutate(status =  abs(patodate - baselinedate),
          status = which(status == min(status))[1]) %>% 
     ungroup

-output -输出

# A tibble: 6 × 5
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>   <int>
1 1     2013-06-03 2018-11-09   1           3
2 1     2014-01-06 2018-11-09   2           3
3 1     2018-11-23 2018-11-09   3           3
4 2     2004-03-03 2018-07-24   1           2
5 2     2018-06-25 2018-07-24   2           2
6 2     2018-12-19 2018-07-24   3           2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM