[英]Match values between R dataframes without using a for-loop
I have the following dataframes:我有以下数据框:
db1 = data.frame(name = c('a', 'b', 'c', 'd'), age = c('10', '20', '30', '40'), tier = NA)
db2 = data.frame(name = c('a', 'a', 'c', 'b'), age = c('10', '10', '30', '20'), tier = c('1', '3', '4', '2'))
I want to enter the tier
values from db2
into the same column in db1
if the name
and age
variables match.如果
name
和age
变量匹配,我想将db2
中的tier
值输入到db1
的同一列中。
I can do this with a for-loop but when we're dealing with thousands of rows this takes far too long.我可以使用 for 循环来做到这一点,但是当我们处理数千行时,这需要很长时间。 Is there a faster way to do this?
有没有更快的方法来做到这一点?
for (i in 1:nrow(db1)){
for (j in 1:nrow(db2)){
if (db1$name[i] == db2$name[j] & db1$age[i] == db2$age[j]){
db1$tier[i] = db2$tier[j]
}
}
}
When taking the first in case it matches multiple times is also ok (you code takes the last), you can use match
and for multiple columns with interaction
.如果取第一个以防它多次匹配也可以(您的代码取最后一个),您可以将
match
和用于带有interaction
的多个列。
db1$tier <- db2$tier[match(interaction(db1[c("name","age")]),
interaction(db2[c("name","age")]))]
db1
# name age tier
#1 a 10 1
#2 b 20 2
#3 c 30 4
#4 d 40 <NA>
Or taking the last match (like your code is doing) using in addition `rev.或者使用另外的 `rev.
db1$tier <- rev(db2$tier)[match(interaction(db1[c("name","age")]),
rev(interaction(db2[c("name","age")])))]
db1
# name age tier
#1 a 10 3
#2 b 20 2
#3 c 30 4
#4 d 40 <NA>
Drop the tier
column and use merge
-删除
tier
列并使用merge
-
db1$tier <- NULL
merge(db1, db2)
# name age tier
#1 a 10 1
#2 a 10 3
#3 b 20 2
#4 c 30 4
If you want d
in the final output use all.x = TRUE
-如果你想
d
在最后的 output 使用all.x = TRUE
-
merge(db1, db2, all.x = TRUE)
# name age tier
#1 a 10 1
#2 a 10 3
#3 b 20 2
#4 c 30 4
#5 d 40 <NA>
We can use merge
+ duplicated
like below我们可以使用
merge
+ duplicated
如下
subset(
merge(db1, db2, by = c("name", "age"), all.x = TRUE),
!duplicated(cbind(name, age)),
select = -tier.x
)
which gives you这给了你
name age tier.y
1 a 10 1
3 b 20 2
4 c 30 4
5 d 40 <NA>
This is a simple join.这是一个简单的连接。
library(dplyr)
db3<-full_join(db2,db1, by = c("name" = "name", "age" = "age"), suffix = c("", ".x"))
name age tier tier.x
1 a 10 1 NA
2 a 10 3 NA
3 c 30 4 NA
4 b 20 2 NA
5 d 40 <NA> NA
### i am assuming you want to have tier from db2 shown if they are not all NAs otherwise you can just drop before the join ###
db3$tier.x = NULL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.