简体   繁体   中英

matrix index subsetting with another matrix

what's a fast way to match two matrices (one and two) together and to extract the index of matrix two for the matches. Matrix two is large (hundreds to thousands of rows).

one
[,1] [,2]
   9   11
  13    2


head(two)
   [,1][,2]
[1,] 9 11
[2,] 11 9
[3,]  2 3
[4,] 13 2
[5,]  2 4
[6,]  3 3

The output should be (notice how index 2 is not an output value)

1 4  

One way of doing this :

a = apply(one, 1, paste0, collapse = "-")
b = apply(two, 1, paste0, collapse = "-")
match(a, b)

#[1] 1 4

We paste all the columns together row-wise for both the matrices and then match them to get the rows which are same.

Just for reference,

a
#[1] "9-11" "13-2"
b
#[1] "9-11" "11-9" "2-3"  "13-2" "2-4"  "3-3" 

You could write a C++ loop to do it fairly quick

library(Rcpp)

cppFunction('NumericVector matrixIndex(NumericMatrix m1, NumericMatrix m2){

int m1Rows = m1.nrow();
int m2Rows = m2.nrow();
NumericVector out;  

for (int i = 0; i < m1Rows; i++){
  for (int j = 0; j < m2Rows; j++){

    if(m1(i, 0) == m2(j, 0) && m1(i, 1) == m2(j, 1)){
        //out[j] = (j+1);
        out.push_back(j + 1);
    }
  }
}

return out;

}')

matrixIndex(m1, m2)
[1] 1 4

Although I suspect it would be faster to pre-allocate the result vector first, something like

cppFunction('NumericVector matrixIndex(NumericMatrix m1, NumericMatrix m2){

int m1Rows = m1.nrow();
int m2Rows = m2.nrow();
NumericVector out(m2Rows);  

for (int i = 0; i < m1Rows; i++){
  for (int j = 0; j < m2Rows; j++){

    if(m1(i, 0) == m2(j, 0) && m1(i, 1) == m2(j, 1)){
        out[j] = (j+1);
        //out.push_back(j + 1);
    }
  }
}

return out;

}')

matrixIndex(m1, m2)
[1] 1 0 0 4 0 0
## 0 == nomatch. 

You don't say if by "fast" you mean compute time or person time. If it only needs doing once, the overall time is probably shortest if you optimize person time, and Ronak's answer is going to be hard to beat, it's clear and robust.

If the numbers are all less than a certain number (say, 100, as in your example data), you can do a similar thing but use arithmetic to combine the two columns together and then match. I suspect (but haven't tested) that this would be faster than converting to character vectors. There are of course other arithmetic options too depending on your circumstance.

a <- one[,1]*100 + one[,2]
b <- two[,1]*100 + two[,2]
match(a, b)

We can use %in%

which(do.call(paste, as.data.frame(two)) %in% do.call(paste, as.data.frame(one)))
#[1] 1 4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM