[英]Dplyr - choosing value in column based on lowest value in other column in R
I am currently working on a dataset with multiple biopsies per patient ID.我目前正在处理一个数据集,每个患者 ID 都有多个活检。 I need to find the biopsy result closest to a specific date (individual per patient).
我需要找到最接近特定日期的活检结果(每个患者个人)。 A dummy dataset can be seen below
下面可以看到一个虚拟数据集
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"))
I have then calculated the time difference between patodate and baselinedate然后我计算了 patodate 和 baselinedate 之间的时间差
df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)
df <- df%>%
group_by(m1) %>%
mutate(diff = baselinedate-recdate)
My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be我现在的问题是 - 我想添加一个名为“状态”的新列,它显示(按组 m1)时间差最接近 0 的“活检”结果。最终结果将是
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"),
status=c("3","3","3","2","2","2"))
I hope someone understands the issue and is able to help.我希望有人理解这个问题并能够提供帮助。 Many thanks
非常感谢
Kind regards,亲切的问候,
Tobias Berg托拜厄斯伯格
You can get index of minimum absolute value of difference between the dates for each group.您可以获得每组日期之间差异的最小绝对值索引。
library(dplyr)
df %>%
group_by(m1) %>%
mutate(status = which.min(abs(patodate - baselinedate))) %>%
ungroup
# m1 patodate baselinedate biopsy status
# <chr> <date> <date> <chr> <int>
#1 1 2013-06-03 2018-11-09 1 3
#2 1 2014-01-06 2018-11-09 2 3
#3 1 2018-11-23 2018-11-09 3 3
#4 2 2004-03-03 2018-07-24 1 2
#5 2 2018-06-25 2018-07-24 2 2
#6 2 2018-12-19 2018-07-24 3 2
Here is an alternative way:这是另一种方法:
library(dplyr)
library(lubridate)
df %>%
group_by(m1) %>%
mutate(across(contains("date"), ymd),
helper = abs(difftime(baselinedate,patodate))) %>%
mutate(status = biopsy[helper==min(helper)]) %>%
select(-helper)
m1 patodate baselinedate biopsy status
<chr> <date> <date> <chr> <chr>
1 1 2013-06-03 2018-11-09 1 3
2 1 2014-01-06 2018-11-09 2 3
3 1 2018-11-23 2018-11-09 3 3
4 2 2004-03-03 2018-07-24 1 2
5 2 2018-06-25 2018-07-24 2 2
6 2 2018-12-19 2018-07-24 3 2
We may do我们可能会做
library(dplyr)
df %>%
group_by(m1) %>%
mutate(status = abs(patodate - baselinedate),
status = which(status == min(status))[1]) %>%
ungroup
-output -输出
# A tibble: 6 × 5
m1 patodate baselinedate biopsy status
<chr> <date> <date> <chr> <int>
1 1 2013-06-03 2018-11-09 1 3
2 1 2014-01-06 2018-11-09 2 3
3 1 2018-11-23 2018-11-09 3 3
4 2 2004-03-03 2018-07-24 1 2
5 2 2018-06-25 2018-07-24 2 2
6 2 2018-12-19 2018-07-24 3 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.