简体   繁体   中英

Dplyr - choosing value in column based on lowest value in other column in R

I am currently working on a dataset with multiple biopsies per patient ID. I need to find the biopsy result closest to a specific date (individual per patient). A dummy dataset can be seen below


df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"))

I have then calculated the time difference between patodate and baselinedate

df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)

df <- df%>%
  group_by(m1) %>%
  mutate(diff = baselinedate-recdate)

My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be

df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"),
                 status=c("3","3","3","2","2","2"))

I hope someone understands the issue and is able to help. Many thanks

Kind regards,

Tobias Berg

You can get index of minimum absolute value of difference between the dates for each group.

library(dplyr)

df %>%
  group_by(m1) %>%
  mutate(status = which.min(abs(patodate - baselinedate))) %>%
  ungroup

#  m1    patodate   baselinedate biopsy status
#  <chr> <date>     <date>       <chr>   <int>
#1 1     2013-06-03 2018-11-09   1           3
#2 1     2014-01-06 2018-11-09   2           3
#3 1     2018-11-23 2018-11-09   3           3
#4 2     2004-03-03 2018-07-24   1           2
#5 2     2018-06-25 2018-07-24   2           2
#6 2     2018-12-19 2018-07-24   3           2

Here is an alternative way:

library(dplyr)
library(lubridate)
df %>% 
  group_by(m1) %>% 
  mutate(across(contains("date"), ymd),
         helper = abs(difftime(baselinedate,patodate))) %>% 
  mutate(status = biopsy[helper==min(helper)]) %>% 
  select(-helper)
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>  <chr> 
1 1     2013-06-03 2018-11-09   1      3     
2 1     2014-01-06 2018-11-09   2      3     
3 1     2018-11-23 2018-11-09   3      3     
4 2     2004-03-03 2018-07-24   1      2     
5 2     2018-06-25 2018-07-24   2      2     
6 2     2018-12-19 2018-07-24   3      2  

We may do

library(dplyr)
df %>%
     group_by(m1) %>%
     mutate(status =  abs(patodate - baselinedate),
          status = which(status == min(status))[1]) %>% 
     ungroup

-output

# A tibble: 6 × 5
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>   <int>
1 1     2013-06-03 2018-11-09   1           3
2 1     2014-01-06 2018-11-09   2           3
3 1     2018-11-23 2018-11-09   3           3
4 2     2004-03-03 2018-07-24   1           2
5 2     2018-06-25 2018-07-24   2           2
6 2     2018-12-19 2018-07-24   3           2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM