Armadillo matrix transpose

Question

I have a huge m*n matrix A (where the number of rows m is much larger then the number of columns n) which is stored in my c++ program as an armadillo mat type. Now I have a vector w for which I have to calculate w=wA*A^T*w where A^T means the transpose of the matrix A.

As the matrix A is very large and consumes a lot of memory, the usual fast approach with armadillo w=wA*At()*w does not work, since armadillo consumes a lot of memory in this case (cf. github ). The way they resolved this, was by introducing the function inplace_trans( A, method ) , which can use the method "lowmem" which consumes less memory but needs more time.

My problem is now, that inplace_trans( A, method ) is a void function, so that I have to create a copy of my matrix first, before I can calculate the new w :

mat Q = A;
inplace_trans(Q, 'lowmmem');
w=w-A*Q*w;

This however is of course not the desired result, since I need a full copy of my matrix, which I wanted to avoid in the first place (RAM problem!). So, how can I get the transpose of my matrix in an efficient (=fast and low memory demanding) way in order to calculate the new w ?

If I do it element wise like in

mat A(m,n); //huge matrix, initialized before
vec temp(m);
temp.fill(0.0);
for (unsigned long int ii=0; ii<m; ii++){

    for (unsigned long int ll=0; ll<m; ll++){
        temp(ii)+=dot(A.row(ii),A.row(ll))*w(ll);
    }
}
w=w-temp;

I have to iterate twice over the number of rows m, which is very costly.

Edit: Up to now the fastest method is the following:

vec temp(m);
inplace_trans(A, "lowmem");
temp = A * w;
inplace_trans(A, "lowmem");
temp = A * temp;

I have to transpose the matrix twice, because I need it back in its original state afterwards. I cannot believe that this should be the fastest way, since it takes way to much operations, imho.

Answer 1

In your edit you already correctly imply that it is of course preferable from a complexity point of view, to perform two matrix-vector multiplications, rather than computing A*At() first an then applying the result to w . Your problem however seems to be that you have to transpose the matrix twice.

If you don't need the matrix back in its untransposed form afterwards, a simple solution to that problem is to just transpose the entire equation: w = w - AA^T w <==> w^T = w^T - w^TAA^T . In that case you can first apply A and then At() . If you can somehow define w to be a row vector altogether, this would then simply amount to

vec temp = w * A;
inplace_trans(A, "lowmem");
temp = temp * A;
w -= temp;

Conceptually, there should be no difference in storage between a row and a column vector, the elements should all be contiguous in memory. You would have to take a look at what explicit difference armadillo makes between row and column vectors, but afaik vectors are just matrices with one dimension set to one. Any way, such considerations are much less stringent on the level of vectors than on the level of matrices.

Answer 2

You can directly compute A*At()*w with less work and far less cache misses and only one copy of A if you do it element by element. I don't know what functions armadillo gives you to help make that fast. But simple access to the rows of the matrix should be good enough to make it practical without using excess memory.

Armadillo matrix transpose

Question

2 answers

solution1
1 2017-01-13 10:42:36

solution2
0 ACCPTED 2015-11-05 13:54:47

Armadillo matrix transpose

Question

2 answers

solution1 1 2017-01-13 10:42:36

solution2 0 ACCPTED 2015-11-05 13:54:47

solution1
1 2017-01-13 10:42:36

solution2
0 ACCPTED 2015-11-05 13:54:47