简体   繁体   English

基于列中的匹配值来查找数据

[英]rbind data based on matching values in a column

I have several data frames I would like to combine, but I need to get rid of rows that don't have matching values in a column in the other data frames. 我有几个要合并的数据框,但是我需要摆脱其他数据框中的列中没有匹配值的行。 For example, I want to merge a, b, and c data frames, based on the values in column x. 例如,我想基于x列中的值合并a,b和c数据帧。

a <- data.frame(1:5, 5:9)
colnames(a) <- c("x", "y")
b <- data.frame(1:4, 7:10)
colnames(b) <- c("x", "y")
c <- data.frame(1:3, 6:8)
colnames(c) <- c("x", "y")

and have the result be 并得出结果是

1   5
2   6
3   7
1   7
2   8
3   9
1   6
2   7
3   8

where the first three rows are from data frame a, the second three rows are from data frame b, and the third three rows are from data frame c, and the rows that didn't have matching values in column x were not included. 其中前三行来自数据帧a,后三行来自数据帧b,后三行来自数据帧c,并且不包括x列中没有匹配值的行。

We create an index based on intersect ing elements of 'x' 我们基于“ x”的intersect元素创建索引

v1 <- Reduce(intersect, list(a$x, b$x, c$x))
rbind(a[a$x %in% v1,], b[b$x %in% v1,], c[c$x %in% v1, ])
#  x y
#1 1 5
#2 2 6
#3 3 7
#4 1 7
#5 2 8
#6 3 9
#7 1 6
#8 2 7
#9 3 8

If there are many dataset objects, it is better to keep it in a list . 如果数据集对象很多,最好将其保留在list Here, the example showed the object identifiers as completely different, but if the identifiers have a pattern eg df1, df2, ..df100 etc, it becomes easier to get it to a list 在此,示例显示对象标识符完全不同,但是如果标识符具有模式,例如df1, df2, ..df100等,则变得更容易将其获取到list

lst1 <- mget(ls(pattern = "^df\\d+$"))

If the object identifiers are all different xyz, abc, fq12 etc, but these are the only data.frame objects loaded in the global environment 如果对象标识符都是不同的xyz, abc, fq12等,但这是在全局环境中加载的唯一data.frame对象

lst1 <-  mget(names(eapply(.GlobalEnv, 'is.data.frame')))

Then, get the interesecitng elements of the column 'x' 然后,获取“ x”列的interesecitng元素

v1 <- Reduce(intersect, lapply(lst1, `[[`, "x"))

Use the intersecting vector to subset the rows of the list elements 使用相交矢量对list元素的行进行子集化

do.call(rbind, lapply(lst1, function(x) dat[dat$x %in% v1,]))

Here, we assume the column names are the same across all the datasets 在这里,我们假设所有数据集的列名都相同


Another option is to do a merge and then unlist 另一种选择是进行merge ,然后unlist

out <- Reduce(function(...) merge(..., by = 'x'), list(a, b, c))
data.frame(x = out$x, y = unlist(out[-1], use.name = FALSE))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM