[英]How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R?
I have two different IDs for the same subject(patient).对于同一个主题(患者),我有两个不同的 ID。 In this other vector of IDs, the two IDs are both in there that indicate the same patient.在另一个 ID 向量中,两个 ID 都在其中,表示同一患者。 How do I only count the patient once(by ID1), instead of two different patients with different IDs?我如何只计算一次患者(按 ID1),而不是两个具有不同 ID 的不同患者?
ID1 ID2
11 12
13 14
15 16
vector向量
11,12,13,13,14,16
I want to count only the unique patients by ID1, such that I would get我只想按 ID1 计算唯一的患者,这样我就可以得到
x=11,13,15
Thank you!谢谢!
Create a unique ID
number for each patient, get the data in long format so both the ID's are in same column, join it with the vector select vector values for distinct ID
values.为每个患者创建一个唯一的ID
号,以长格式获取数据,以便两个 ID 位于同一列中,将其与向量 select 向量值连接以获得不同的ID
值。
library(dplyr)
df %>%
mutate(ID = row_number()) %>%
tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
inner_join(tibble::enframe(vector), by = 'value') %>%
distinct(ID, .keep_all = TRUE) %>%
select(value)
# value
# <dbl>
#1 11
#2 13
#3 16
data数据
df <- structure(list(ID1 = c(11L, 13L, 15L), ID2 = c(12L, 14L, 16L)),
class = "data.frame", row.names = c(NA, -3L))
vector <- c(11, 12, 13, 13, 14, 16)
I think probably you need this我想你可能需要这个
df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>%
select(ID1)
ID1
1 11
2 13
3 15
Check it on a better sample在更好的样本上检查它
df <- structure(list(ID1 = c(11L, 13L, 15L, 17L, 19L, 21L), ID2 = c(12L,
14L, 16L, 18L, 20L, 22L)), class = "data.frame", row.names = c(NA,
-6L)
> df
ID1 ID2
1 11 12
2 13 14
3 15 16
4 17 18
5 19 20
6 21 22
vector <- c(11, 12, 13, 13, 14, 16, 18, 18)
> df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>% select(ID1)
ID1
1 11
2 13
3 15
4 17
By slightly modifying Ronak's code, you can get same results通过稍微修改 Ronak 的代码,你可以得到相同的结果
df %>%
mutate(ID = row_number()) %>%
tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
inner_join(tibble::enframe(vector), by = 'value') %>%
distinct(ID, .keep_all = T) %>%
select(ID, value) %>%
inner_join(df %>% mutate(ID = row_number()), by = 'ID') %>%
select(ID1)
You can use any
with %in%
by selecting the rows with apply
to subset ID1
.您可以通过选择带有apply
to subset ID1
的行来使用any
带有%in%
的行。
ID$ID1[apply(ID, 1, function(z) any(v %in% z))]
#[1] 11 13 15
or use rowSums
.或使用rowSums
。
ID$ID1[rowSums(sapply(ID, "%in%", v)) > 0]
#[1] 11 13 15
Data:数据:
ID <- read.table(header=TRUE, text="ID1 ID2
11 12
13 14
15 16")
v <- c(11,12,13,13,14,16)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.