[英]Extract observations that match all possible permutations that match two columns
我有一个包含 10,000 个观测值和两个配对数据集的数据框。 在另一个数据集中,我有一些选定的配对数据集。
test <- data.frame (iso1=c("A", "B", "C"))
data <- data.frame(hosp1=c("A", "B", "C", "D", "E", "C", "A", "B"),
hosp2=c("B", "c", "F", "C", "G", "A", "H", "A"),
dist= c(12,32,23,12,12,45,13))
现在,我想从数据集“数据”中提取所有可能形成数据排列的观察结果,例如“A”和“B”、“B”和“A”、“A”和“C”、“C”和"A", "B" & "C", "C" & "B" 使用测试数据
我希望得到这样的东西
hosp1 hosp2 dist
A B 12
B C 23
C A 12
B A 13
您可以只测试test$iso1
是否同时出现在hosp1
和hosp2
:
data[data$hosp1 %in% test$iso1 & data$hosp2 %in% test$iso1, ]
hosp1 hosp2 dist
1 A B 12
2 B C 32
6 C A 45
8 B A 13
请注意,我在您的示例数据中修复了一个未大写的字母和一个缺失的dist
值。
也许我们可以使用if_all
进行filter
,以查找是否在“hosp”列中找到了“iso1”列中的任何元素
library(dplyr)
data %>%
filter(if_all(starts_with('hosp'), ~ .x %in% test$iso1))
-输出
hosp1 hosp2 dist
1 A B 12
2 B C 23
3 C A 12
4 B A 13
或以base R
subset(data, Reduce(`&`, lapply(data[1:2], `%in%`, test$iso1)))
hosp1 hosp2 dist
1 A B 12
2 B C 23
6 C A 12
8 B A 13
data <- structure(list(hosp1 = c("A", "B", "C", "D", "E", "C", "A", "B"
), hosp2 = c("B", "C", "F", "C", "G", "A", "H", "A"), dist = c(12,
23, 32, 12, 45, 12, 10, 13)), class = "data.frame", row.names = c(NA,
-8L))
使用RcppAlgos::permuteGeneral
。
将空排列显示为NA
,
RcppAlgos::permuteGeneral(test$iso1, 2, FUN=\(x) {
d <- data[with(data, hosp1 == x[1] & hosp2 == x[2]), 'dist']
data.frame(hosp=t(x), dist=ifelse(is.null(d), NA_real_, d))
}) |> do.call(what=rbind)
# hosp.1 hosp.2 dist
# 1 A B 12
# 2 A C NA
# 3 B A 13
# 4 B C 32
# 5 C A 45
# 6 C B NA
或摆脱他们。
RcppAlgos::permuteGeneral(test$iso1, 2, FUN=\(x) {
d <- data[with(data, hosp1 == x[1] & hosp2 == x[2]), 'dist']
if (!length(d) == 0) data.frame(hosp=t(x), dist=d) else NULL
}) |> do.call(what=rbind)
# hosp.1 hosp.2 dist
# 1 A B 12
# 2 B A 13
# 3 B C 32
# 4 C A 45
数据
data <- structure(list(hosp1 = c("A", "B", "C", "D", "E", "C", "A", "B"
), hosp2 = c("B", "C", "F", "C", "G", "A", "H", "A"), dist = c(12,
32, 23, 12, 12, 45, 13, 13)), class = "data.frame", row.names = c(NA,
-8L))
test <- structure(list(iso1 = c("A", "B", "C")), class = "data.frame", row.names = c(NA,
-3L))
将串联的组合与串联的 hosp 行进行比较。 然后问...列表中是否有“AB”...
data <- data.frame(hosp1=c("A", "B", "C", "D", "E", "C", "A", "B"),
hosp2=c("B", "C", "F", "C", "G", "A", "H", "A"),
dist= c(12, 32, 23, 12, 12 ,45, 13, 13)) ### i added a missing value here
combo<-(expand.grid(test$iso1, test$iso1))
data[paste(data$hosp1, data$hosp2) %in% paste(combo$Var1, combo$Var2),]
hosp1 hosp2 dist
1 A B 12
2 B C 32
6 C A 45
8 B A 13
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.