有效访问存储为1D阵列的3D阵列

Question

I have a 3D array which is stored as a 1D array in a columnwise fashion. 我有一个3D数组，以列方式存储为1D数组。 For example, 例如，

for( int k = 0; k < nk; k++ ) // Loop through the height.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
        for( int i = 0; i < ni; i++ ) // Loop through the columns.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray[ ijk ] = 1.0;
        }

For my application, I need to access all the rows/columns/height of my3Darray . 对于我的应用程序，我需要访问my3Darray所有行/列/高度。 By height, I mean the vectors in the third dimension of the array. 高度是指数组第三维中的向量。 I need this because I want to process the FFT of each vector and return the resulted vector. 我需要这样做是因为我想处理每个向量的FFT并返回结果向量。 I would be thankful to have my friends's thoughts in stackoverflow, how I can efficiently access these vectors. 我很高兴在堆栈溢出中得到我朋友的想法，我如何有效地访问这些向量。 Of course one trivial possibility, for example, for the height vectors is: 当然，例如，高度矢量的一种可能是：

for( int i = 0; i < ni; i++ ) // Loop through the columns.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
    {
        for( int k = 0; k < nk; k++ ) // Loop through the heights.
        {
            ijk = i + ni * j + ni * nj * k;
            myvec[ k ] = my3Darray[ ijk ];
            fft( myvec, myvec_processed );
        }

        // Store the results in a new array, which is storing myvec_processed in my3Darray_fft_values.
        for( int k = 0; k < nk; k++ ) // Loop through the heights.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray_fft_values[ ijk ] = myvec_processed[ k ];
        }
    }

Am I computing this efficiently? 我能有效地计算吗？ Is there a possibility of passing my3Darray directly to the function which processes the FFT of the vectors (instead of copying the vector to myvec )? 是否有可能将my3Darray直接传递给处理向量FFT的函数（而不是将向量复制到myvec ）？

Answer 1

You can cut down on the multiplies by precomputing a stride like this: 您可以通过预先计算跨度来减少乘法：

...
for( int j = 0; j < nj; j++ ) // Loop through the rows.
{
    int stride = ni * nj;
    ijk = i + ni * j;
    for( int k = 0; k < nk; k++ ) // Loop through the heights.
    {
        myvec[ k ] = my3Darray[ ijk ];
        fft( myvec, myvec_processed );
        ijk += stride;
    }
}

But this will only speed things up a little. 但这只会加快速度。 You will still have cache problems due to accessing my3Darray in a nonsequential fashion. 由于以非顺序方式访问my3Darray ，您仍然会遇到缓存问题。

Answer 2

When everything gets reduced to its innermost bits and bytes, your 3-dimensional array gets stored, of course, in one dimensional memory. 当所有内容减少到最里面的位和字节时，您的3维数组当然会存储在一维内存中。 So, given array element's three dimensions, the compiler produces pretty much the same code to calculate the location of the array element as you are doing yourself. 因此，在给定数组元素的三个维度的情况下，编译器会生成几乎相同的代码来计算数组元素的位置，就像您自己做的那样。 Surprise! 惊喜！

So, in other words, it's pretty much the same thing. 换句话说，这几乎是同一件事。

The only thing that might work in the compiler's favor, with explicit 3-dimensional arrays, is that the compiler knows the sizes of all the inner dimensions, and if the size of the innermost dimensional slice happens to be something convenient, like a power of 2, the compiler might replace some of the multiplications with equivalent left shifts, which would be slightly faster, I suppose, then a full blown multiply instruction. 对于显式3维数组，唯一对编译器有利的方法是，编译器知道所有内部维的大小，并且如果最内部维切片的大小恰好是方便的，例如如图2所示，编译器可能会用等效的左移替换某些乘法，我想这会稍微快一些，然后是一条完整的乘法指令。 But I'd be surprised if it turns out to be a large difference in performance. 但是如果发现性能差异很大，我会感到惊讶。

It's probably more important to pick the relative order of your dimensions, so that your typical access patterns, for your transformations, will be more CPU cache-friendly. 选择维度的相对顺序可能更重要，这样对于您的转换而言，典型的访问模式将对CPU缓存更加友好。

有效访问存储为1D阵列的3D阵列

问题描述

2 个解决方案

解决方案1
1 2014-11-27 02:29:16

解决方案2
0 2014-11-27 02:08:19

有效访问存储为1D阵列的3D阵列

问题描述

2 个解决方案

解决方案1 1 2014-11-27 02:29:16

解决方案2 0 2014-11-27 02:08:19

解决方案1
1 2014-11-27 02:29:16

解决方案2
0 2014-11-27 02:08:19