简体   繁体   English

比较不同数据框的更快列

[英]Compare faster columns of different dataframes

Let's assume two dataframes: A and B containing data like the following one: 让我们假设两个数据帧:A和B包含如下数据:

Dataframe: A              Dataframe: B
   ColA                     ColB1      ColB2
| Dog   |                 | Lion     | yes
| Lion  |                 | Cat      | 
| Zebra |                 | Elephant | 
| Bat   |                 | Dog      | yes

Want to compare the values of ColA to the values of ColB1, in order to insert yes in case of match in column ColB2. 想要将ColA的值与ColB1的值进行比较,以便在列ColB2中匹配时插入yes。 What I'm running is this: 我正在运行的是这样的:

for (i in 1:nrow(B)){
    for (j in 1:nrow(A)){
         if (B[i,1] == A[j,1]){
             B[i,2] <- "yes"
         }  
    }   
}

In reality we re talking abaout 20000 lines. 实际上,我们正在谈论abaout 20000行。 How could this become faster? 怎么会更快呢?

You can use the %in% operator to determine membership: 您可以使用%in%运算符确定成员资格:

B$ColB2 <- B$ColB1 %in% A$ColA

ColB2 will contain TRUE/FALSE dependent on whether value in ColB1 of data frame B was found in ColA of data frame A . ColB2将包含TRUE/FALSE具体取决于是否在数据帧A ColA中找到数据帧B ColB1的值。

For more info see: 有关更多信息,请参见:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM