為什么這個順序數組循環比使用“查找”數組的循環慢？

Question

我最近一直在研究緩存局部性，我試圖了解CPU如何訪問內存。 我寫了一個實驗，看看在順序循環數組時是否存在性能差異，而使用某種查找表來索引數據數組。 我很驚訝地發現查找方法稍快一些。 我的代碼如下。 我在Windows上用GCC編譯（MinGW）。

#include <stdlib.h>
#include <stdio.h>
#include <windows.h>

int main()
{
    DWORD dwElapsed, dwStartTime;

    //random arrangement of keys to lookup
    int lookup_arr[] = {0, 3, 8, 7, 2, 1, 4, 5, 6, 9};

    //data for both loops
    int data_arr1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int data_arr2[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    //first loop, sequential access
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr1[i]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Normal loop completed: %d\n", dwElapsed);

    //second loop, indexes into data_arr2 using the lookup array
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr2[lookup_arr[i]]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Lookup loop completed: %d\n", dwElapsed);

    return 0;
}

運行這個，我得到：

Normal loop completed: 375
Lookup loop completed: 297

Answer 1

按照我之前的評論，這是你如何做這件事。

重復測量
估計錯誤
大內存塊
隨機與線性指數（所以無論哪種方式都有間接）

結果是速度與“隨機索引”有顯着差異。

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>

#define N 1000000

int main(void) {
  int *rArr;
  int *rInd; // randomized indices
  int *lInd; // linear indices
  int ii;

  rArr = malloc(N * sizeof(int) );
  rInd = malloc(N * sizeof(int) );
  lInd = malloc(N * sizeof(int) );

  for(ii = 0; ii < N; ii++) {
    lInd[ii] = ii;
    rArr[ii] = rand();
    rInd[ii] = rand()%N;
  }

  int loopCount;
  int sum;
  time_t startT, stopT;
  double dt, totalT=0, tt2=0;

  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[lInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d\n", sum);
  printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), (tt2 - totalT * totalT / 100.0)/100.0 / (double)(CLOCKS_PER_SEC));

  totalT = 0; tt2 = 0;
  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[rInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d\n", sum);
  printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), sqrt((tt2 - totalT * totalT / 100.0)/100.0) / (double)(CLOCKS_PER_SEC));
}

結果 - 順序訪問速度提高了2倍（在我的機器上）：

sum is -1444272372
total time: 0.396539 += 0.000219
sum is 546230204
total time: 0.756407 += 0.001165

通過-O3優化，差異甚至更加明顯 - 快3倍：

sum is -318372465
total time: 0.142444 += 0.013230
sum is 1672130111
total time: 0.455804 += 0.000402

Answer 2

我相信你正在編譯而沒有打開優化。 使用-O2 g ++可以優化所有內容，因此運行時間為0，沒有標志我得到類似的結果。

在修改程序以便data_arr1和data_arr2中的值實際用於某些東西時，我得到78ms。

為什么這個順序數組循環比使用“查找”數組的循環慢？

問題描述

2 個解決方案

解決方案1
2 已采納 2013-12-10 14:32:53

解決方案2
1 2013-12-10 13:57:37

為什么這個順序數組循環比使用“查找”數組的循環慢？

問題描述

2 個解決方案

解決方案1 2 已采納 2013-12-10 14:32:53

解決方案2 1 2013-12-10 13:57:37

解決方案1
2 已采納 2013-12-10 14:32:53

解決方案2
1 2013-12-10 13:57:37