简体   繁体   English

预编译多维数组访问

[英]Precompiled Multi-Dimensional Array Access

Imagine calculating a three dimensional array like this: 想象一下像这样计算三维数组:

for (int i = 0; i < I; i++)
{
    for (int j = 0; j < J; j++)
    {
        for (int k = 0; k < K; k++)
        {
            array[k + j * K + i * K * J] = someValue(i, j, k);
        }
    }
}

But the k + j * K + i * K * J part is kinda expensive. 但是k + j * K + i * K * J部分有点贵。 Is it possible to tell the compiler to convert the loops into something like this? 是否可以告诉编译器将循环转换为类似的内容?

array[0] = someValue(0, 0, 0);
array[1] = someValue(0, 0, 1);
array[2] = someValue(0, 0, 2);
array[3] = someValue(0, 1, 0);
...

This would ofcourse make the binaries larger, but would also speed up performance, if this code is executed a lot. 当然,这将使二进制文件更大,但如果此代码执行得很多,也将提高性能。 Is it possible to do this? 是否有可能做到这一点? Or would I have to generate the code myself and paste it into the source file? 还是我必须自己生成代码并将其粘贴到源文件中?

I believe in your particular case, we can re-write the loop as: 我相信在您的特定情况下,我们可以将循环重写为:

auto* scan = array;
for (int i = 0; i < I; i++)
{
    for (int j = 0; j < J; j++)
    {
        for (int k = 0; k < K; k++)
        {
            *scan++ = someValue(i, j, k);
        }
    }
}

This is a micro-optimization, and not something you usually have to worry about. 这是一个微优化,而您通常不必担心。 Here's why. 这就是为什么。

Reason 1: integer multiplication is incredibly cheap . 原因1:整数乘法非常便宜 Calculating k + j * K + i * K * J is cheaper than retrieving a value from the computer's RAM, and it'll be about as cheap as (if not cheaper than) retrieving it from the CPU's fastest cache. 计算k + j * K + i * K * J比从计算机的RAM中检索值便宜,并且与从CPU最快的高速缓存中检索值一样便宜(如果便宜的话)。

Reason 2: Compilers are incredibly smart. 原因2:编译器非常聪明。 They can recognize which values change and which values stay the same, and optimize common sub-expressions out of loops (so that they're not performing the same computation multiple times). 他们可以识别出哪些值发生了变化,哪些值保持不变,并优化了循环外的常用子表达式(这样它们就不会多次执行相同的计算)。

Reason 3: Compilers are capable of taking advantage of vectorization instructions. 原因3:编译器能够利用矢量化指令。 Depending on what someValue does, it may be able to compute multiple values in parallel on the same core by taking advantage of this. 根据someValue作用,利用此功能,它可能能够在同一内核上并行计算多个值。 This is true for either method of indexing into array . 无论哪种方法都可以索引到array

C++ code isn't strictly imperative. C ++代码不是严格必须的。 Compilers can and do make major and complex optimizations to make code more efficient, and code like the one in your example are easy for them to optimize. 编译器可以而且确实可以进行重大而复杂的优化以提高代码的效率,并且示例中的代码很容易进行优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM