简体   繁体   English

如何将两个不同的 ID 协调为一个,然后应用于具有两个 ID 的 df,但在 R 中只计算一次主题?

[英]How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R?

I have two different IDs for the same subject(patient).对于同一个主题(患者),我有两个不同的 ID。 In this other vector of IDs, the two IDs are both in there that indicate the same patient.在另一个 ID 向量中,两个 ID 都在其中,表示同一患者。 How do I only count the patient once(by ID1), instead of two different patients with different IDs?我如何只计算一次患者(按 ID1),而不是两个具有不同 ID 的不同患者?

ID1 ID2 
 11 12
 13 14
 15 16

vector向量

11,12,13,13,14,16

I want to count only the unique patients by ID1, such that I would get我只想按 ID1 计算唯一的患者,这样我就可以得到

x=11,13,15

Thank you!谢谢!

Create a unique ID number for each patient, get the data in long format so both the ID's are in same column, join it with the vector select vector values for distinct ID values.为每个患者创建一个唯一的ID号,以长格式获取数据,以便两个 ID 位于同一列中,将其与向量 select 向量值连接以获得不同的ID值。

library(dplyr)

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = TRUE) %>%
  select(value)

#  value
#  <dbl>
#1    11
#2    13
#3    16

data数据

df <- structure(list(ID1 = c(11L, 13L, 15L), ID2 = c(12L, 14L, 16L)), 
class = "data.frame", row.names = c(NA, -3L))
vector <- c(11, 12, 13, 13, 14, 16)

I think probably you need this我想你可能需要这个

df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>%
   select(ID1)

  ID1
1  11
2  13
3  15

Check it on a better sample在更好的样本上检查它

df <- structure(list(ID1 = c(11L, 13L, 15L, 17L, 19L, 21L), ID2 = c(12L, 
14L, 16L, 18L, 20L, 22L)), class = "data.frame", row.names = c(NA, 
-6L)

> df
  ID1 ID2
1  11  12
2  13  14
3  15  16
4  17  18
5  19  20
6  21  22


vector <- c(11, 12, 13, 13, 14, 16, 18, 18)

> df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>% select(ID1)
  
   ID1
1  11
2  13
3  15
4  17

By slightly modifying Ronak's code, you can get same results通过稍微修改 Ronak 的代码,你可以得到相同的结果

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = T) %>%
  select(ID, value) %>%
  inner_join(df %>% mutate(ID = row_number()), by = 'ID') %>%
  select(ID1)

You can use any with %in% by selecting the rows with apply to subset ID1 .您可以通过选择带有apply to subset ID1的行来使用any带有%in%的行。

ID$ID1[apply(ID, 1, function(z) any(v %in% z))]
#[1] 11 13 15

or use rowSums .或使用rowSums

ID$ID1[rowSums(sapply(ID, "%in%", v)) > 0]
#[1] 11 13 15

Data:数据:

ID <- read.table(header=TRUE, text="ID1 ID2 
 11 12
 13 14
 15 16")
v <- c(11,12,13,13,14,16)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM