Dplyr - 根据 R 中其他列中的最低值选择列中的值

Question

I am currently working on a dataset with multiple biopsies per patient ID.我目前正在处理一个数据集，每个患者 ID 都有多个活检。 I need to find the biopsy result closest to a specific date (individual per patient).我需要找到最接近特定日期的活检结果（每个患者个人）。 A dummy dataset can be seen below下面可以看到一个虚拟数据集


df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"))

I have then calculated the time difference between patodate and baselinedate然后我计算了 patodate 和 baselinedate 之间的时间差

df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)

df <- df%>%
  group_by(m1) %>%
  mutate(diff = baselinedate-recdate)

My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be我现在的问题是 - 我想添加一个名为“状态”的新列，它显示（按组 m1）时间差最接近 0 的“活检”结果。最终结果将是

df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"),
                 status=c("3","3","3","2","2","2"))

I hope someone understands the issue and is able to help.我希望有人理解这个问题并能够提供帮助。 Many thanks非常感谢

Kind regards,亲切的问候，

Tobias Berg托拜厄斯伯格

Answer 1

You can get index of minimum absolute value of difference between the dates for each group.您可以获得每组日期之间差异的最小绝对值索引。

library(dplyr)

df %>%
  group_by(m1) %>%
  mutate(status = which.min(abs(patodate - baselinedate))) %>%
  ungroup

#  m1    patodate   baselinedate biopsy status
#  <chr> <date>     <date>       <chr>   <int>
#1 1     2013-06-03 2018-11-09   1           3
#2 1     2014-01-06 2018-11-09   2           3
#3 1     2018-11-23 2018-11-09   3           3
#4 2     2004-03-03 2018-07-24   1           2
#5 2     2018-06-25 2018-07-24   2           2
#6 2     2018-12-19 2018-07-24   3           2

Answer 2

Here is an alternative way:这是另一种方法：

library(dplyr)
library(lubridate)
df %>% 
  group_by(m1) %>% 
  mutate(across(contains("date"), ymd),
         helper = abs(difftime(baselinedate,patodate))) %>% 
  mutate(status = biopsy[helper==min(helper)]) %>% 
  select(-helper)

  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>  <chr> 
1 1     2013-06-03 2018-11-09   1      3     
2 1     2014-01-06 2018-11-09   2      3     
3 1     2018-11-23 2018-11-09   3      3     
4 2     2004-03-03 2018-07-24   1      2     
5 2     2018-06-25 2018-07-24   2      2     
6 2     2018-12-19 2018-07-24   3      2

Answer 3

We may do我们可能会做

library(dplyr)
df %>%
     group_by(m1) %>%
     mutate(status =  abs(patodate - baselinedate),
          status = which(status == min(status))[1]) %>% 
     ungroup

-output -输出

# A tibble: 6 × 5
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>   <int>
1 1     2013-06-03 2018-11-09   1           3
2 1     2014-01-06 2018-11-09   2           3
3 1     2018-11-23 2018-11-09   3           3
4 2     2004-03-03 2018-07-24   1           2
5 2     2018-06-25 2018-07-24   2           2
6 2     2018-12-19 2018-07-24   3           2

Dplyr - 根据 R 中其他列中的最低值选择列中的值

问题描述

3 个解决方案

解决方案1
1 2021-10-08 11:27:48

解决方案2
1 2021-10-08 12:10:51

解决方案3
1 已采纳 2021-10-08 19:59:03

Dplyr - 根据 R 中其他列中的最低值选择列中的值

问题描述

3 个解决方案

解决方案1 1 2021-10-08 11:27:48

解决方案2 1 2021-10-08 12:10:51

解决方案3 1 已采纳 2021-10-08 19:59:03

解决方案1
1 2021-10-08 11:27:48

解决方案2
1 2021-10-08 12:10:51

解决方案3
1 已采纳 2021-10-08 19:59:03