[英]How do I create a table with rows from a first table that also matches 3 columns in the row of a second table in R
I need to create a third table (df3) with the rows of a first table (df1) that match the row values of three of the 5 columns in a second table (df2).我需要创建第三个表 (df3),其中第一个表 (df1) 的行与第二个表 (df2) 中 5 列中的三个的行值匹配。 Two starting tables df1 and df2 do not have the same number of rows.
两个起始表 df1 和 df2 的行数不同。
Example:例子:
df1 df2
chain freq color length type1 type2 chain freq color length type1 type2
AC 24 red 100 C V2 BD 45 blue 73 C G5
BD 57 green 87 C G5 YJ 57 green 78 N Y6
OP 83 yellow 68 R Q9 TP 8 orange 98 Y P2
TP 28 blue 74 Y P2 HP 50 white 87 A U9
HP 23 yellow 39 A U9 ZS 87 red 98 P N8
XC 8 green 98 T N8
The resulting table has rows that are in df1 that match the columns chain , type1 , and type2 in df2.生成的表在 df1 中具有与 df2 中的列chain 、 type1和type2匹配的行。 In this example it would look like this:
在此示例中,它看起来像这样:
df3
chain freq color length type1 type2
BD 57 green 87 C G5
TP 28 blue 74 Y P2
HP 23 yellow 39 A U9
I'm trying to do this avoiding loops as much as possible.我正在尝试尽可能避免循环。 I've been looking through the dplyr functions, but I'm not quite familiar with the package yet.
我一直在查看 dplyr 功能,但我对 package 还不太熟悉。 Any thoughts are appreciated.
任何想法表示赞赏。
We could use semi_join
我们可以使用
semi_join
library(dplyr)
semi_join(df1, df2, by = c('chain', 'type1', 'type2'))
# chain freq color length type1 type2
#1 BD 57 green 87 C G5
#2 TP 28 blue 74 Y P2
#3 HP 23 yellow 39 A U9
df1 <- structure(list(chain = c("AC", "BD", "OP", "TP", "HP"), freq = c(24L,
57L, 83L, 28L, 23L), color = c("red", "green", "yellow", "blue",
"yellow"), length = c(100L, 87L, 68L, 74L, 39L), type1 = c("C",
"C", "R", "Y", "A"), type2 = c("V2", "G5", "Q9", "P2", "U9")), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(chain = c("BD", "YJ", "TP", "HP", "ZS", "XC"),
freq = c(45L, 57L, 8L, 50L, 87L, 8L), color = c("blue", "green",
"orange", "white", "red", "green"), length = c(73L, 78L,
98L, 87L, 98L, 98L), type1 = c("C", "N", "Y", "A", "P", "T"
), type2 = c("G5", "Y6", "P2", "U9", "N8", "N8")),
class = "data.frame", row.names = c(NA,
-6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.