Eigen: how to prevent extra copies of a large object; assign to result without realizing full matrix on RHS

Question

I apologize in advance if some of this is basic C++ that I'm failing to understand.

Before showing my code, let me explain what I'm trying to accomplish. I have a sparse Matrix U and a vector r, I want to compute (Ur)(Ur)' where the subtraction is against each column of U.

However, I can't do this all at once because Ur is dense and explodes memory usage (~ 7 million columns versus ~ 20,000 rows).

Taking advantage of the fact that the outer product XX' can be computed one column at a time, XX' == sum(XcXc') , where sum is matrix addition, my strategy is to take several columns, do the subtraction and outer product and accumulate the result. Using only a few columns at a time keeps memory usage down to a very reasonable number (a few hundred MB).

On the face of it, this would require 2 copies of 20,000 x 20,000 matrices (at 3.5 GB each), one for the accumulated result and one for the temporary right hand side. However, for reasons I don't understand, based on observed memory usage, I have 3 copies.

Because I want to parallelize this operation as much as I can (it is quite expensive) reducing memory usage is of paramount importance.

So, Step 1 is to get me from 3 copies to 2 copies.

Step 2, if possible, is to realize that there is no reason that the result should never need to be realized on the RHS. That is, there is no reason to not just go ahead and add the result to each element of the accumulated matrix as it is computed instead of creating a temporary matrix on the RHS and then executing the addition to the accumulator matrix.

Step 3, is to reduce computation time by taking advantage of the fact that a symmetric matrix is being produced. I think this is done with .selfadjointView(Lower), but I couldn't parse exactly how to keep doing this on a consistent basis.

Finally, the code. I'm doing the parallelization in R and this code just represents one process of parallelization. I am passing in a list of contiguous vectors of column indices to compute.

// [[Rcpp::depends(RcppEigen)]] 
#include <iostream>
#include "Rcpp.h"
#include "RcppEigen.h"
#include "Eigen/Dense"
#include "Eigen/Sparse"

using Eigen::MatrixXd;

typedef Eigen::MappedSparseMatrix<double> MSpMat;
typedef Eigen::Map<Eigen::VectorXd> MVec;
typedef Eigen::Map<MatrixXd> MMat;



/*
 * tcrossprod_cpp just compute X * X' where X is a matrix, * is matrix
 * multiplication and ' is transpose, but in an efficient manner,
 * although it appears that R's tcrossprod is actually faster. Pulled it from
 * the RcppEigen book.
 */


MatrixXd tcrossprod_cpp(const MatrixXd &U) {
    const long m(U.rows());
    MatrixXd UUt(MatrixXd(m, m).setZero().
            selfadjointView<Eigen::Lower>().rankUpdate(U));
    return UUt;
}

// [[Rcpp::export]]
MatrixXd gen_Sigma_cpp_block_sp(const Rcpp::List &col_list, const MSpMat &U,
                                const MVec &r, int index1 = 1) {
    long nrow = U.rows();
    MatrixXd out = MatrixXd::Constant(nrow, nrow, 0.0);
    long ncol;
    Rcpp::IntegerVector y;
    for (long i = 0; i < col_list.size(); i++) {
        if (i % 10 == 0) {
            Rcpp::checkUserInterrupt();
        }
        y = col_list[i];
        ncol = y[y.size() - 1] - y[0] + 1;
        out.noalias() += tcrossprod_cpp((MatrixXd (U.block(0, y[0] - index1,
                                         nrow, ncol))).colwise() - r);
    }
    return out;
}

Answer 1

You should rewrite your expression. Mathematically, subtracting r from every column of U is the same as U - r*ones (where ones is a row vector with the same number of columns as U ). Expanding gives you:

(U-r*ones)*(U-r*ones)^T = U*U^T - (U*ones^T)*r^T - r*(ones*U^T) + r*(ones*ones^T)*r^T

ones*ones^T is equal to U.cols() , U*ones^T can be calculated as U*VectorXd::Ones(U.cols()) and stored into a dense vector. The remaining operations are one sparse product of U*U.transpose() (which you can directly store into a dense matrix, since your end result will be dense, followed by two rank updates:

VectorXd Usum = U * VectorXd::Ones(U.cols()); // sum of columns of U
MatrixXd result = U*U.transpose();
result.selfadjointView<Lower>().rankUpdate(Usum, r, -1.0);
result.selfadjointView<Lower>().rankUpdate(r,U.cols());

To answer the question about the extra temporaries: Inside tcrossprod_cpp you create a temporary MatrixXd(m,m) and you store the result into MatrixXd UUt . You can actually avoid this method entirely and directly write

out.selfadjointView<Lower>().rankUpdate(MatrixXd(U.block(0, y[0] - index1,
                                     nrow, ncol))).colwise() - r);

Edit: Directly assigning a sparse product to a dense matrix apparently is not possible before Eigen 3.3 (I was testing with 3.3rc1). If it is possible for you, I suggest switching to version 3.3 (there are many other improvements).

Answer 2

I couldn't get chtz's code to compile. I would have liked to have given them credit for the answer, but the user Michael Albers decided that editing the response to include the correct code was not acceptable. So I have to create a new post with the correct answer.

I had to create an intermediate sparse matrix for the outer product of U before converting to a dense matrix. This seems less than ideal, and I've seen others with this issue but not a way around it. In any case, this result would compile:

// [[Rcpp::export]]
MatrixXd gen_Sigma_cpp_sp(const MSpMat &U, const MVec &r) {
    VectorXd UcolSum = U * VectorXd::Ones(U.cols());
    MatrixXd S = MatrixXd(SparseMatrix<double>(U * U.transpose())).
                    selfadjointView<Lower>().rankUpdate(UcolSum, r, -1.0).
                                             rankUpdate(r, U.cols());
    return S;
}

For anyone using this from R, I have to wrap this in forceSymmetric before I could force to type 'dpoMatrix', which is what a plain tcrossprod(U - r) would've given, and helps the most with computations down the line:

SigmaS0 = as(forceSymmetric(gen_Sigma_cpp_sp(U, r), 'L'), 'dpoMatrix')

Eigen: how to prevent extra copies of a large object; assign to result without realizing full matrix on RHS

Question

2 answers

solution1
2 ACCPTED 2016-10-15 19:37:28

solution2
0 2016-10-16 02:30:16

Eigen: how to prevent extra copies of a large object; assign to result without realizing full matrix on RHS

Question

2 answers

solution1 2 ACCPTED 2016-10-15 19:37:28

solution2 0 2016-10-16 02:30:16

solution1
2 ACCPTED 2016-10-15 19:37:28

solution2
0 2016-10-16 02:30:16