在不使用 for 循环的情况下匹配 R 数据帧之间的值

Question

I have the following dataframes:我有以下数据框：

db1 = data.frame(name = c('a', 'b', 'c', 'd'), age = c('10', '20', '30', '40'), tier = NA)
db2 = data.frame(name = c('a', 'a', 'c', 'b'), age = c('10', '10', '30', '20'), tier = c('1', '3', '4', '2'))

I want to enter the tier values from db2 into the same column in db1 if the name and age variables match.如果name和age变量匹配，我想将db2中的tier值输入到db1的同一列中。

I can do this with a for-loop but when we're dealing with thousands of rows this takes far too long.我可以使用 for 循环来做到这一点，但是当我们处理数千行时，这需要很长时间。 Is there a faster way to do this?有没有更快的方法来做到这一点？

for (i in 1:nrow(db1)){
  for (j in 1:nrow(db2)){
    if (db1$name[i] == db2$name[j] & db1$age[i] == db2$age[j]){
      db1$tier[i] = db2$tier[j]
    }
  }
}

Answer 1

When taking the first in case it matches multiple times is also ok (you code takes the last), you can use match and for multiple columns with interaction .如果取第一个以防它多次匹配也可以（您的代码取最后一个），您可以将match和用于带有interaction的多个列。

db1$tier <- db2$tier[match(interaction(db1[c("name","age")]),
                           interaction(db2[c("name","age")]))]
db1
#  name age tier
#1    a  10    1
#2    b  20    2
#3    c  30    4
#4    d  40 <NA>

Or taking the last match (like your code is doing) using in addition `rev.或者使用另外的 `rev.

db1$tier <- rev(db2$tier)[match(interaction(db1[c("name","age")]),
                    rev(interaction(db2[c("name","age")])))]
db1
#  name age tier
#1    a  10    3
#2    b  20    2
#3    c  30    4
#4    d  40 <NA>

Answer 2

Drop the tier column and use merge -删除tier列并使用merge -

db1$tier <- NULL
merge(db1, db2)

#  name age tier
#1    a  10    1
#2    a  10    3
#3    b  20    2
#4    c  30    4

If you want d in the final output use all.x = TRUE -如果你想d在最后的 output 使用all.x = TRUE -

merge(db1, db2, all.x = TRUE)

#  name age tier
#1    a  10    1
#2    a  10    3
#3    b  20    2
#4    c  30    4
#5    d  40 <NA>

Answer 3

We can use merge + duplicated like below我们可以使用merge + duplicated如下

subset(
  merge(db1, db2, by = c("name", "age"), all.x = TRUE),
  !duplicated(cbind(name, age)),
  select = -tier.x
)

which gives you这给了你

  name age tier.y
1    a  10      1
3    b  20      2
4    c  30      4
5    d  40   <NA>

Answer 4

This is a simple join.这是一个简单的连接。

library(dplyr)
db3<-full_join(db2,db1, by = c("name" = "name", "age" = "age"), suffix = c("", ".x"))

  name age tier tier.x
1    a  10    1     NA
2    a  10    3     NA
3    c  30    4     NA
4    b  20    2     NA
5    d  40 <NA>     NA

### i am assuming you want to have tier from db2 shown if they are not all NAs otherwise you can just drop before the join ###

db3$tier.x = NULL

在不使用 for 循环的情况下匹配 R 数据帧之间的值

问题描述

4 个解决方案

解决方案1
2 已采纳 2021-06-09 13:32:49

解决方案2
2 2021-06-09 13:50:07

解决方案3
1 2021-06-09 13:54:34

解决方案4
0 2021-06-09 15:24:27

在不使用 for 循环的情况下匹配 R 数据帧之间的值

问题描述

4 个解决方案

解决方案1 2 已采纳 2021-06-09 13:32:49

解决方案2 2 2021-06-09 13:50:07

解决方案3 1 2021-06-09 13:54:34

解决方案4 0 2021-06-09 15:24:27

解决方案1
2 已采纳 2021-06-09 13:32:49

解决方案2
2 2021-06-09 13:50:07

解决方案3
1 2021-06-09 13:54:34

解决方案4
0 2021-06-09 15:24:27