简体   繁体   English

矩阵索引子集与另一个矩阵

[英]matrix index subsetting with another matrix

what's a fast way to match two matrices (one and two) together and to extract the index of matrix two for the matches. 将两个矩阵(一个和两个)匹配在一起并提取匹配的矩阵2的索引的快速方法是什么? Matrix two is large (hundreds to thousands of rows). 矩阵二很大(数百到数千行)。

one
[,1] [,2]
   9   11
  13    2


head(two)
   [,1][,2]
[1,] 9 11
[2,] 11 9
[3,]  2 3
[4,] 13 2
[5,]  2 4
[6,]  3 3

The output should be (notice how index 2 is not an output value) 输出应为(注意索引2如何不是输出值)

1 4  

One way of doing this : 一种方法是:

a = apply(one, 1, paste0, collapse = "-")
b = apply(two, 1, paste0, collapse = "-")
match(a, b)

#[1] 1 4

We paste all the columns together row-wise for both the matrices and then match them to get the rows which are same. 我们将两个矩阵的所有列按行粘贴在一起,然后将它们匹配以得到相同的行。

Just for reference, 仅供参考,

a
#[1] "9-11" "13-2"
b
#[1] "9-11" "11-9" "2-3"  "13-2" "2-4"  "3-3" 

You could write a C++ loop to do it fairly quick 您可以编写一个C ++循环来相当快地完成它

library(Rcpp)

cppFunction('NumericVector matrixIndex(NumericMatrix m1, NumericMatrix m2){

int m1Rows = m1.nrow();
int m2Rows = m2.nrow();
NumericVector out;  

for (int i = 0; i < m1Rows; i++){
  for (int j = 0; j < m2Rows; j++){

    if(m1(i, 0) == m2(j, 0) && m1(i, 1) == m2(j, 1)){
        //out[j] = (j+1);
        out.push_back(j + 1);
    }
  }
}

return out;

}')

matrixIndex(m1, m2)
[1] 1 4

Although I suspect it would be faster to pre-allocate the result vector first, something like 尽管我怀疑首先预先分配结果向量会更快,例如

cppFunction('NumericVector matrixIndex(NumericMatrix m1, NumericMatrix m2){

int m1Rows = m1.nrow();
int m2Rows = m2.nrow();
NumericVector out(m2Rows);  

for (int i = 0; i < m1Rows; i++){
  for (int j = 0; j < m2Rows; j++){

    if(m1(i, 0) == m2(j, 0) && m1(i, 1) == m2(j, 1)){
        out[j] = (j+1);
        //out.push_back(j + 1);
    }
  }
}

return out;

}')

matrixIndex(m1, m2)
[1] 1 0 0 4 0 0
## 0 == nomatch. 

You don't say if by "fast" you mean compute time or person time. 您没有说“快速”是指计算时间还是人的时间。 If it only needs doing once, the overall time is probably shortest if you optimize person time, and Ronak's answer is going to be hard to beat, it's clear and robust. 如果只需要执行一次,那么如果您优化人员时间,总时间可能是最短的,而Ronak的答案将很难被击败,这是明确而可靠的。

If the numbers are all less than a certain number (say, 100, as in your example data), you can do a similar thing but use arithmetic to combine the two columns together and then match. 如果所有数字均小于某个特定数字(例如,在示例数据中为100),则可以执行类似的操作,但可以使用算术将两列组合在一起然后匹配。 I suspect (but haven't tested) that this would be faster than converting to character vectors. 我怀疑(但尚未测试)这会比转换为字符向量更快。 There are of course other arithmetic options too depending on your circumstance. 当然,根据您的情况,当然还有其他算术选项。

a <- one[,1]*100 + one[,2]
b <- two[,1]*100 + two[,2]
match(a, b)

We can use %in% 我们可以使用%in%

which(do.call(paste, as.data.frame(two)) %in% do.call(paste, as.data.frame(one)))
#[1] 1 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM