犰狳矩阵转置

Question

I have a huge m*n matrix A (where the number of rows m is much larger then the number of columns n) which is stored in my c++ program as an armadillo mat type. 我有一个巨大的m * n个矩阵A（其中的行数m远远大于n列的数目），其被存储在我的C ++程序作为armadillo mat类型。 Now I have a vector w for which I have to calculate w=wA*A^T*w where A^T means the transpose of the matrix A. 现在我有一个向量w ，我必须为其计算w=wA*A^T*w ，其中A^T表示矩阵A的转置。

As the matrix A is very large and consumes a lot of memory, the usual fast approach with armadillo w=wA*At()*w does not work, since armadillo consumes a lot of memory in this case (cf. github ). 由于矩阵A非常大且消耗大量内存，因此犰狳w=wA*At()*w的常规快速方法不起作用，因为在这种情况下犰狳消耗了大量内存（参见github ）。 The way they resolved this, was by introducing the function inplace_trans( A, method ) , which can use the method "lowmem" which consumes less memory but needs more time. 他们解决此问题的方法是引入函数inplace_trans( A, method ) ，该函数可以使用方法“ lowmem”，该方法消耗较少的内存，但需要更多的时间。

My problem is now, that inplace_trans( A, method ) is a void function, so that I have to create a copy of my matrix first, before I can calculate the new w : 现在我的问题是， inplace_trans( A, method )是一个无效函数，因此在计算新w之前，我必须先创建矩阵的副本：

mat Q = A;
inplace_trans(Q, 'lowmmem');
w=w-A*Q*w;

This however is of course not the desired result, since I need a full copy of my matrix, which I wanted to avoid in the first place (RAM problem!). 但是，这当然不是理想的结果，因为我需要矩阵的完整副本，而我首先要避免这样做（RAM问题！）。 So, how can I get the transpose of my matrix in an efficient (=fast and low memory demanding) way in order to calculate the new w ? 因此，如何才能以有效的方式（快速且低内存需求）获得矩阵的转置，以计算新的w ？

If I do it element wise like in 如果我这样做是明智的

mat A(m,n); //huge matrix, initialized before
vec temp(m);
temp.fill(0.0);
for (unsigned long int ii=0; ii<m; ii++){

    for (unsigned long int ll=0; ll<m; ll++){
        temp(ii)+=dot(A.row(ii),A.row(ll))*w(ll);
    }
}
w=w-temp;

I have to iterate twice over the number of rows m, which is very costly. 我必须对行数m进行两次迭代，这非常昂贵。

Edit: Up to now the fastest method is the following: 编辑：到目前为止，最快的方法如下：

vec temp(m);
inplace_trans(A, "lowmem");
temp = A * w;
inplace_trans(A, "lowmem");
temp = A * temp;

I have to transpose the matrix twice, because I need it back in its original state afterwards. 我必须转置矩阵两次，因为此后我需要将其恢复为原始状态。 I cannot believe that this should be the fastest way, since it takes way to much operations, imho. 我不能相信这应该是最快的方法，因为它需要进行很多操作，恕我直言。

Answer 1

In your edit you already correctly imply that it is of course preferable from a complexity point of view, to perform two matrix-vector multiplications, rather than computing A*At() first an then applying the result to w . 在编辑中，您已经正确地暗示，从复杂度的角度来看，当然最好执行两个矩阵向量乘法，而不是先计算A*At() ，然后将结果应用于w 。 Your problem however seems to be that you have to transpose the matrix twice. 但是，您的问题似乎是必须两次转置矩阵。

If you don't need the matrix back in its untransposed form afterwards, a simple solution to that problem is to just transpose the entire equation: w = w - AA^T w <==> w^T = w^T - w^TAA^T . 如果之后不需要矩阵以未转置的形式返回，则该问题的简单解决方案是仅转置整个方程： w = w - AA^T w <==> w^T = w^T - w^TAA^T In that case you can first apply A and then At() . 在这种情况下，您可以先应用A ，然后再应用At() 。 If you can somehow define w to be a row vector altogether, this would then simply amount to 如果您能以某种方式将w定义为行向量，那么就等于

vec temp = w * A;
inplace_trans(A, "lowmem");
temp = temp * A;
w -= temp;

Conceptually, there should be no difference in storage between a row and a column vector, the elements should all be contiguous in memory. 从概念上讲，行向量和列向量之间的存储应该没有差异，元素应该在内存中都是连续的。 You would have to take a look at what explicit difference armadillo makes between row and column vectors, but afaik vectors are just matrices with one dimension set to one. 您将不得不看看armadillo在行向量和列向量之间的显式区别是什么，但是afaik向量只是一维设置为1的矩阵。 Any way, such considerations are much less stringent on the level of vectors than on the level of matrices. 无论如何，这种考虑在向量层面上比在矩阵层面上要严格得多。

Answer 2

You can directly compute A*At()*w with less work and far less cache misses and only one copy of A if you do it element by element. 您可以直接计算A*At()*w ，从而减少工作量，大大减少缓存丢失，并且如果逐个元素地进行操作，则仅复制一份A I don't know what functions armadillo gives you to help make that fast. 我不知道犰狳提供了哪些功能来帮助您快速实现这一目标。 But simple access to the rows of the matrix should be good enough to make it practical without using excess memory. 但是简单地访问矩阵的行应该足以使其实用而不用过多的内存。

犰狳矩阵转置

问题描述

2 个解决方案

解决方案1
1 2017-01-13 10:42:36

解决方案2
0 已采纳 2015-11-05 13:54:47

犰狳矩阵转置

问题描述

2 个解决方案

解决方案1 1 2017-01-13 10:42:36

解决方案2 0 已采纳 2015-11-05 13:54:47

解决方案1
1 2017-01-13 10:42:36

解决方案2
0 已采纳 2015-11-05 13:54:47