使用 Rcpp 的高效矩阵子集

Question

I am trying to find an efficient way to subset a matrix with Rcpp for a non-continuous set of rows and columns:我试图找到一种有效的方法来使用 Rcpp为非连续的行和列集对矩阵进行子集化：

m <- matrix(1:20000000, nrow=5000)

rows <- sample(1:5000, 100)
cols <- sample(1:4000, 100)

In R, the matrix can be directly subsetted using the rows and cols vectors:在R，该矩阵可以被直接使用子集rows和cols载体：

matrix_subsetting <- function(m, rows, cols){
  return(m[rows, cols])
}

m[rows, cols]
# or
matrix_subsetting(m, rows, cols)

The fastest Rcpp way, I was able to find so far was:到目前为止，我能找到的最快的Rcpp方式是：

Rcpp::cppFunction("

  NumericMatrix cpp_matrix_subsetting(NumericMatrix m, NumericVector rows, NumericVector cols){
    
    int rl = rows.length();
    int cl = cols.length();
    NumericMatrix out(rl, cl);
    
    for (int i=0; i<cl; i++){
      NumericMatrix::Column org_c = m(_, cols[i]-1);
      NumericMatrix::Column new_c = out(_, i);
      for (int j=0; j<rl; j++){
        new_c[j] = org_c[rows[j]-1];
      }
    }
    return(out);
  }

")

But in comparison, the Rcpp version is significantly slower:但相比之下，Rcpp 版本要慢得多：

> microbenchmark::microbenchmark(matrix_subsetting(m, rows, cols), cpp_matrix_subsetting(m, rows, cols), times=500)
Unit: microseconds
                                 expr       min        lq       mean    median         uq        max neval
     matrix_subsetting(m, rows, cols)    23.269    90.127   107.8273   130.347   135.3285    605.235   500
 cpp_matrix_subsetting(m, rows, cols) 69191.784 75254.277 88484.9328 90477.448 95611.9090 178903.973   500

Any ideas, to get at least a comparable speed with Rcpp?任何想法，至少获得与 Rcpp 相当的速度？

I already tried the RcppArmadillo arma::mat::submat function, but it is slower than my version.我已经尝试过RcppArmadillo arma::mat::submat函数，但它比我的版本慢。

Solution:解决方案：

Implementation of the cpp_matrix_subsetting function with IntegerMatrix instead of NumericMatrix .使用IntegerMatrix而不是NumericMatrix实现cpp_matrix_subsetting函数。

New benchmark:新基准：

> microbenchmark::microbenchmark(matrix_subsetting(m, rows, cols), cpp_matrix_subsetting(m, rows, cols), times=1e4)
Unit: microseconds
                                 expr    min     lq     mean median      uq      max neval
     matrix_subsetting(m, rows, cols) 41.110 60.261 66.88845 61.730 63.8900 14723.52 10000
 cpp_matrix_subsetting(m, rows, cols) 43.703 61.936 71.56733 63.362 65.8445 27314.11 10000

Answer 1

This is because you have a matrix m of type integer (not double as NumericMatrix is expecting) so this makes a copy of the entire matrix (which takes a lot of time).这是因为您有一个integer类型的矩阵m （不是NumericMatrix期望的double ），所以这会复制整个矩阵（这需要很多时间）。

For example, try with m <- matrix(1:20000000 + 0, nrow=5000) instead.例如，尝试使用m <- matrix(1:20000000 + 0, nrow=5000)代替。

使用 Rcpp 的高效矩阵子集

问题描述

1 个解决方案

解决方案1
6 已采纳 2019-12-11 12:07:53

使用 Rcpp 的高效矩阵子集

问题描述

1 个解决方案

解决方案1 6 已采纳 2019-12-11 12:07:53

解决方案1
6 已采纳 2019-12-11 12:07:53