比较不同数据框的更快列

Question

Let's assume two dataframes: A and B containing data like the following one: 让我们假设两个数据帧：A和B包含如下数据：

Dataframe: A              Dataframe: B
   ColA                     ColB1      ColB2
| Dog   |                 | Lion     | yes
| Lion  |                 | Cat      | 
| Zebra |                 | Elephant | 
| Bat   |                 | Dog      | yes

Want to compare the values of ColA to the values of ColB1, in order to insert yes in case of match in column ColB2. 想要将ColA的值与ColB1的值进行比较，以便在列ColB2中匹配时插入yes。 What I'm running is this: 我正在运行的是这样的：

for (i in 1:nrow(B)){
    for (j in 1:nrow(A)){
         if (B[i,1] == A[j,1]){
             B[i,2] <- "yes"
         }  
    }   
}

In reality we re talking abaout 20000 lines. 实际上，我们正在谈论abaout 20000行。 How could this become faster? 怎么会更快呢？

Answer 1

You can use the %in% operator to determine membership: 您可以使用%in%运算符确定成员资格：

B$ColB2 <- B$ColB1 %in% A$ColA

ColB2 will contain TRUE/FALSE dependent on whether value in ColB1 of data frame B was found in ColA of data frame A . ColB2将包含TRUE/FALSE具体取决于是否在数据帧A ColA中找到数据帧B ColB1的值。

For more info see: 有关更多信息，请参见：

https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html

比较不同数据框的更快列

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-08-31 15:16:57

比较不同数据框的更快列

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-08-31 15:16:57

解决方案1
2 已采纳 2017-08-31 15:16:57