Compare faster columns of different dataframes

Question

Let's assume two dataframes: A and B containing data like the following one:

Dataframe: A              Dataframe: B
   ColA                     ColB1      ColB2
| Dog   |                 | Lion     | yes
| Lion  |                 | Cat      | 
| Zebra |                 | Elephant | 
| Bat   |                 | Dog      | yes

Want to compare the values of ColA to the values of ColB1, in order to insert yes in case of match in column ColB2. What I'm running is this:

for (i in 1:nrow(B)){
    for (j in 1:nrow(A)){
         if (B[i,1] == A[j,1]){
             B[i,2] <- "yes"
         }  
    }   
}

In reality we re talking abaout 20000 lines. How could this become faster?

Answer 1

You can use the %in% operator to determine membership:

B$ColB2 <- B$ColB1 %in% A$ColA

ColB2 will contain TRUE/FALSE dependent on whether value in ColB1 of data frame B was found in ColA of data frame A .

For more info see:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html

Compare faster columns of different dataframes

Question

1 answers

solution1
2 ACCPTED 2017-08-31 15:16:57

Compare faster columns of different dataframes

Question

1 answers

solution1 2 ACCPTED 2017-08-31 15:16:57

solution1
2 ACCPTED 2017-08-31 15:16:57