简体   繁体   English

快速将1D阵列复制到cpp中的3D阵列

[英]Fast copy 1D array to 3D array in cpp

Is it possible copy from 1D array to 3D with some function as memcpy? 是否可以从一维数组复制到3D,并将某些功能作为memcpy?

Now I am using a slow method : 现在我使用的是一种缓慢的方法:

for(int loop1 = 0; loop1 < numberAgents; loop1++)
    for(int loop2 = 0; loop2 < fieldWidth; loop2++)
        for(int loop3 = 0; loop3 < fieldWidth; loop3++)
            potentialField[loop1][loop2][loop3] = cpuPotentialField[loop1 * fieldWidth * fieldWidth + loop2 * fieldWidth + loop3];

This doesn't work : 这不起作用:

memPotentialField = numberAgents * fieldWidth * fieldWidth * sizeof(float);
memcpy(potentialField, cpuPotentialField, memPotentialField);

Multi-dimensional arrays are stored row-wise (§ 8.3.4/9), so essentially your approach with memcpy is fine (because floats are PODs). 多维数组按行存储(第8.3.4 / 9节),因此基本上你的memcpy方法很好(因为浮点数是POD)。

memcpy(&potentialField[0][0][0], cpuPotentialField,
       sizeof(potentialField)/sizeof(***potentialField));

Using std::copy is better, since it works for non-PODS too. 使用std :: copy更好,因为它也适用于非PODS。 So I would write 所以我会写

std::copy(&potentialField[0][0][0],
          &potentialField[0][0][0] + sizeof(potentialField)/sizeof(potentialField[0][0][0]),
          cpuPotentialField);

Unless you have a particularly bad compiler or you've forgotten to turn on optimisation (eg -O3 ) then the first method should be fine performance-wise. 除非您有一个特别糟糕的编译器或者您忘记打开优化(例如-O3 ),否则第一种方法应该是良好的性能。 However you may be able to optimise it a little by hoisting some of the multiplies: 但是你可以通过提升一些倍数来优化它:

for (int loop1 = 0; loop1 < numberAgents; loop1++)
{
    const int index1 = loop1 * fieldWidth * fieldwidth;

    for (int loop2 = 0; loop2 < fieldWidth; loop2++)
    {
        const int index2 = index1 + loop2 * fieldWidth;

        for (int loop3 = 0; loop3 < fieldWidth; loop3++)
        {
            potentialField[loop1][loop2][loop3] = cpuPotentialField[index2 + loop3];
        }
    }
}

You may be able to get some performance by unrolling the loop. 您可以通过展开循环来获得一些性能。 In some processors, branch or jump instructions cause the instruction pipeline to be reloaded, wasting time. 在某些处理器中,分支或跳转指令会导致重新加载指令流水线,从而浪费时间。

//...
unsigned int items_remaining = fieldWidth;
for (unsigned int loop3 = 0; loop3 < fieldWidth; ++loop3)
{
    unsigned int copy_count = 4 - (items_remaining % 4);
    switch (copy_count)
    {
        // The fall-through of these cases is intentional.
        case 4:
          potentialField[loop1][loop2][loop3] = cpuPotentialField[loop1 * fieldWidth * fieldWidth + loop2 * fieldWidth + loop3];
          ++loop3;
          --items_remaining;
        case 3:
          potentialField[loop1][loop2][loop3] = cpuPotentialField[loop1 * fieldWidth * fieldWidth + loop2 * fieldWidth + loop3];
          ++loop3;
          --items_remaining;
        case 2:
          potentialField[loop1][loop2][loop3] = cpuPotentialField[loop1 * fieldWidth * fieldWidth + loop2 * fieldWidth + loop3];
          ++loop3;
          --items_remaining;
        case 1:
          potentialField[loop1][loop2][loop3] = cpuPotentialField[loop1 * fieldWidth * fieldWidth + loop2 * fieldWidth + loop3];
          ++loop3;
          --items_remaining;
    } // End: switch
} // End: for  

This is only unrolled for 4 items. 这仅针对4件商品展开。 The more items in the loop, the more efficient the loop. 循环中的项目越多,循环越有效。 As Paul R said, precomputing some of the indices would also help. 正如Paul R所说,预先计算一些指数也会有所帮助。

Some processors may have specialized copy instructions that the compiler can take advantage of, depending on the compiler. 某些处理器可能具有编译器可以利用的专用复制指令,具体取决于编译器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM