[英]Reduction (sum) along arbitrary axes of a multidimensional array
I want to perform a sum reduction along arbitrary axes of a multidimensional matrix which may have arbitrary dimensions (eg axis 5 of a 10-dimensional array). 我想沿多维矩阵的任意轴执行和减少,该多维矩阵可以具有任意尺寸(例如,10维阵列的轴5)。 The matrix is stored using the row-major format, ie as a
vector
together with the strides along each axis. 矩阵使用行主格式存储,即作为
vector
与沿每个轴的步幅一起存储。
I know how to perform this reduction using nested loops (see example below), but doing this results in a hard-coded axis (the reduction is along axis 1 below) and an arbitrary number of dimensions (4 below). 我知道如何使用嵌套循环执行此缩减(请参见下面的示例),但这样做会导致硬编码轴(缩减沿着下面的轴1)和任意数量的维度(下面的4)。 How can I generalize this without using the nested loops?
如何在不使用嵌套循环的情况下对此进行概括?
#include <iostream>
#include <vector>
int main()
{
// shape, stride & data of the matrix
size_t shape [] = { 2, 3, 4, 5};
size_t strides[] = {60,20, 5, 1};
std::vector<double> data(2*3*4*5);
for ( size_t i = 0 ; i < data.size() ; ++i ) data[i] = 1.;
// shape, stride & data (zero-initialized) of the reduced matrix
size_t rshape [] = { 2, 4, 5};
size_t rstrides[] = {20, 5, 1};
std::vector<double> rdata(2*4*5, 0.0);
// compute reduction
for ( size_t a = 0 ; a < shape[0] ; ++a )
for ( size_t c = 0 ; c < shape[2] ; ++c )
for ( size_t d = 0 ; d < shape[3] ; ++d )
for ( size_t b = 0 ; b < shape[1] ; ++b )
rdata[ a*rstrides[0] + c*rstrides[1] + d*rstrides[2] ] += \
data [ a*strides [0] + b*strides [1] + c*strides [2] + d*strides [3] ];
// print resulting reduced matrix
for ( size_t a = 0 ; a < rshape[0] ; ++a )
for ( size_t b = 0 ; b < rshape[1] ; ++b )
for ( size_t c = 0 ; c < rshape[2] ; ++c )
std::cout << "(" << a << "," << b << "," << c << ") " << \
rdata[ a*rstrides[0] + b*rstrides[1] + c*rstrides[2] ] << std::endl;
return 0;
}
Note: I want to avoid 'decompressing' and 'compressing' a counter. 注意:我想避免'解压缩'和'压缩'计数器。 By this I mean that I could, in pseudo-code, do:
我的意思是,我可以用伪代码做:
for ( size_t i = 0 ; i < data.size() ; ++i )
{
i -> {a,b,c,d}
discard "b" (axis 1) -> {a,c,d}
rdata(a,c,d) += data(a,b,c,d)
}
I don't know how efficient this code is, but in my opinion, it is sure to be precise. 我不知道这个代码有多高效,但在我看来,它确实是准确的。
A little on adjusted_strides
: 关于
adjusted_strides
一点点:
For axis_count = 4
, adjusted_strides
has size 5
, where: 对于
axis_count = 4
, adjusted_strides
大小为5
,其中:
adjusted_strides[0] = shape[0]*shape[1]*shape[2]*shape[3];
adjusted_strides[1] = shape[1]*shape[2]*shape[3];
adjusted_strides[2] = shape[2]*shape[3];
adjusted_strides[3] = shape[3];
adjusted_strides[4] = 1;
Let's take the example where the number of dimensions is 4
and the shape of the multidimensional array ( A
) is n0, n1, n2, n3
. 让我们以维数为
4
,多维数组( A
)的形状为n0, n1, n2, n3
为例。
When we need to transform this array into another multidimensional array ( B
) of shape: n0, n2, n3
(compressing axis = 1 (0-based)
), then, we try to proceed as follows: 当我们需要将这个数组转换成另一个形状的多维数组(
B
): n0, n2, n3
(压缩axis = 1 (0-based)
)时,我们尝试按如下方式进行:
For each index of A
we try to find its position in B
. 对于
A
每个指数,我们试图找到它在B
位置。 Let A[i][j][k][l]
be any element in A
. 设
A[i][j][k][l]
为A
任何元素。 Its position in flat_A
will be A[i*n1*n2*n3 + j*n2*n3 + k*n3 + l]
它在
flat_A
位置将是A[i*n1*n2*n3 + j*n2*n3 + k*n3 + l]
idx = i*n1*n2*n3 + j*n2*n3 + k*n3 + l;
In the compressed array B
, this element will be a part of (or added to), B[i][k][l]
. 在压缩数组
B
,该元素将是B[i][k][l]
的一部分(或添加到其中)。 In flat_B
the index is new_idx = i*n2*n3 + k*n3 + l;
在
flat_B
,索引是new_idx = i*n2*n3 + k*n3 + l;
. 。
How do we form new_idx
from idx
? 那我们怎样形成
new_idx
从idx
?
All the axes before the compressed axis have the shape of the compressed axis as a part of their product. 压缩轴之前的所有轴都具有压缩轴的形状作为其产品的一部分。 In our example we had to remove axis
1
, so all the axes which were before the 1st axis (only one here: the 0th axis
) represented by i
), have n1
as a part of product ( i*n1*n2*n3
). 在我们的例子中,我们必须删除轴
1
,因此i
)所代表的0th axis
1轴之前的所有轴(这里只有一个:第0th axis
), n1
作为产品的一部分( i*n1*n2*n3
) 。
All the axes after the compressed axis remain unaffected. 压缩轴后的所有轴都不受影响。
Finally, we need to do two things: 最后,我们需要做两件事:
Isolate the indices of the axes before the index of the axis to be compressed and remove the shape of this axis: 在要压缩的轴的索引之前隔离轴的索引,并删除此轴的形状:
Integer division : idx / (n1*n2*n3);
整数除法 :
idx / (n1*n2*n3);
( == idx / adjusted_strides[1]
). (
== idx / adjusted_strides[1]
)。
We are left with just i
, which can be readjusted according to the new shape (by multiplying with n2*n3
): we get 我们只剩下
i
,可以根据新形状重新调整(乘以n2*n3
):我们得到
i*n2*n3
( == i * adjusted_strides[2]
). i*n2*n3
( == i * adjusted_strides[2]
)。
We isolate the axes after the compressed axis, which are unaffected by its shape. 我们在压缩轴之后隔离轴,这些轴不受其形状的影响。
idx % (n2*n3)
( == idx % adjusted_strides[2]
) idx % (n2*n3)
( == idx % adjusted_strides[2]
)
which gives us k*n3 + l
. 这给了我们
k*n3 + l
。
Adding the results of step i. 添加步骤i的结果。 and ii.
和ii。 results in:
结果是:
computed_idx = i*n2*n3 + k*n3 + l;
Which is the same as new_idx
. 这与
new_idx
相同。 So, our transformation was correct :). 所以,我们的转型是正确的:)。
Note: ni
refers to new_idx
. 注意:
ni
指的是new_idx
。
size_t cmp_axis = 1, axis_count = sizeof shape/ sizeof *shape;
std::vector<size_t> adjusted_strides;
//adjusted strides is basically same as strides
//only difference being that the first element is the
//total number of elements in the n dim array.
//The only reason to introduce this array was
//so that I don't have to write any if-elses
adjusted_strides.push_back(shape[0]*strides[0]);
adjusted_strides.insert(adjusted_strides.end(), strides, strides + axis_count);
for(size_t i = 0; i < data.size(); ++i) {
size_t ni = i/adjusted_strides[cmp_axis]*adjusted_strides[cmp_axis+1] + i%adjusted_strides[cmp_axis+1];
rdata[ni] += data[i];
}
(0,0,0) 3
(0,0,1) 3
(0,0,2) 3
(0,0,3) 3
(0,0,4) 3
(0,1,0) 3
(0,1,1) 3
(0,1,2) 3
(0,1,3) 3
(0,1,4) 3
(0,2,0) 3
(0,2,1) 3
(0,2,2) 3
(0,2,3) 3
(0,2,4) 3
(0,3,0) 3
(0,3,1) 3
(0,3,2) 3
...
I think this should work: 我认为这应该有效:
#include <iostream>
#include <vector>
int main()
{
// shape, stride & data of the matrix
size_t shape [] = { 2, 3, 4, 5};
size_t strides[] = {60, 20, 5, 1};
std::vector<double> data(2 * 3 * 4 * 5);
size_t rshape [] = { 2, 4, 5};
size_t rstrides[] = {3, 5, 1};
std::vector<double> rdata(2 * 4 * 5, 0.0);
const unsigned int NDIM = 4;
unsigned int axis = 1;
for (size_t i = 0 ; i < data.size() ; ++i) data[i] = 1;
// How many elements to advance after each reduction
size_t step_axis = strides[NDIM - 1];
if (axis == NDIM - 1)
{
step_axis = strides[NDIM - 2];
}
// Position of the first element of the current reduction
size_t offset_base = 0;
size_t offset = 0;
size_t s = 0;
for (auto &v : rdata)
{
// Current reduced element
size_t offset_i = offset;
for (unsigned int i = 0; i < shape[axis]; i++)
{
// Reduce
v += *(data.data() + offset_i);
// Advance to next element
offset_i += strides[axis];
}
s = (s + 1) % strides[axis];
if (s == 0)
{
offset_base += strides[axis - 1];
offset = offset_base;
}
else
{
offset += step_axis;
}
}
// Print
for ( size_t a = 0 ; a < rshape[0] ; ++a )
for ( size_t b = 0 ; b < rshape[1] ; ++b )
for ( size_t c = 0 ; c < rshape[2] ; ++c )
std::cout << "(" << a << "," << b << "," << c << ") " << \
rdata[ a*rstrides[0] + b*rstrides[1] + c*rstrides[2] ] << std::endl;
return 0;
}
Output: 输出:
(0,0,0) 3
(0,0,1) 3
(0,0,2) 3
(0,0,3) 3
(0,0,4) 3
(0,1,0) 3
(0,1,1) 3
(0,1,2) 3
(0,1,3) 3
(0,1,4) 3
(0,2,0) 3
(0,2,1) 3
(0,2,2) 3
// ...
Setting axis = 3
yields: 设置
axis = 3
产生:
(0,0,0) 5
(0,0,1) 5
(0,0,2) 5
(0,0,3) 5
(0,0,4) 5
(0,1,0) 5
(0,1,1) 5
(0,1,2) 5
(0,1,3) 5
(0,1,4) 5
(0,2,0) 5
(0,2,1) 5
(0,2,2) 5
(0,2,3) 5
// ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.