简体   繁体   English

循环优化

[英]Loops optimization

I have a loop and inside a have a inner loop.我有一个循环,里面有一个内循环。 How can I optimise it please in order to optimise execution time like avoiding accessing to memory many times to the same thing and avoid the maximum possible the addition and multiplication.我该如何优化它以优化执行时间,例如避免多次访问 memory 到同一件事,并尽可能避免加法和乘法。

int n,m,x1,y1,x2,y2,cnst;
int N = 9600;
int M = 1800;
int temp11,temp12,temp13,temp14;
int temp21,temp22,temp23,temp24;
int *arr1 = new int [32000]; // suppose it's already filled
int *arr2 = new int [32000];// suppose it's already filled

int sumFirst = 0;
int maxFirst = 0;
int indexFirst = 0;
int sumSecond = 0;
int maxSecond = 0;
int indexSecond = 0;
int jump = 2400;
for( n = 0; n < N; n++)
{
    temp14 = 0;
    temp24 = 0;
    for( m = 0; m < M; m++)
    {
        x1 = m + cnst;
        y1 = m + n + cnst;
        temp11 = arr1[x1];
        temp12 = arr2[y1];
        temp13 = temp11 * temp12;
        temp14+= temp13;
        
        x2 = m + cnst + jump;
        y2 = m + n + cnst + jump;
        temp21 = arr1[x2];
        temp22 = arr2[y2];
        temp23 = temp21 * temp22;
        temp24+= temp23;
    }

    sumFirst += temp14;
    if (temp14 > maxFirst)
    {
        maxFirst = temp14;
        indexFirst = m;
    }
    
    sumSecond += temp24;
    if (temp24 > maxSecond)
    {
        maxSecond = temp24;
        indexSecond = n;
    }
}

// At the end we use sum , index and max for first and second;

You are multiplying array elements and accumulating the result.您正在将数组元素相乘并累积结果。 This can be optimized by:这可以通过以下方式进行优化:

  • SIMD (doing multiple operations at a single CPU step) SIMD(在单个 CPU 步骤中执行多项操作)
  • Parallel execution (using multiple physical/logical CPUs at once)并行执行(一次使用多个物理/逻辑 CPU)

Look for CPU-specific SIMD way of doing this.寻找特定于 CPU 的 SIMD 方式来执行此操作。 Like _mm_mul_epi32 from SSE4.1 can possibly be used on x86-64.像 SSE4.1 中的_mm_mul_epi32一样,可以在 x86-64 上使用。 Before trying to write your own SIMD version with compiler intrinsics, make sure the compiler doesn't do it already for you.在尝试使用编译器内在函数编写您自己的 SIMD 版本之前,请确保编译器尚未为您执行此操作。

As for parallel execution, look into omp, or using C++17 parallel accumulate.至于并行执行,看看omp,或者使用C++17并行累加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM