使用時鍾計算時間得出的值為零-Linux

Question

我有一個cuda代碼，可以在GPU上執行計算。 我正在使用clock（）; 找出時機

我的代碼結構是

__global__ static void sum(){

// calculates sum 
}

extern "C"
int run_kernel(int array[],int nelements){
 clock_t start, end;
  start = clock();
  //perform operation on gpu - call sum
 end = clock();
 double elapsed_time = ((double) (end - start)) / CLOCKS_PER_SEC;
 printf("time required : %lf", elapsed_time);
}

但是時間始終是0.0000，我檢查了打印的開始和結束時間。 開始有一些值，但結束時間始終為零。

知道可能是什么原因嗎？ 任何其他測量時間的方法。

任何幫助，將不勝感激。

謝謝

Answer 1

這里有兩個問題：

clock()函數的分辨率太低，無法測量您要計時的事件的持續時間
CUDA內核啟動是異步操作，因此幾乎不消耗時間（在正常的平台上通常為10到20微秒）。 除非您使用同步CUDA API調用強制主機CPU阻塞直到內核完成運行，否則您將不會測量執行時間。

CUDA擁有自己的高精度計時API，這是對在GPU上運行的操作進行計時的推薦方法。 使用它的代碼如下所示：

int run_kernel(int array[],int nelements){

    cudaEvent_t start,stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);

    cudaEventRecord(start, 0);

    //
    //perform operation on gpu - call sum
    //

    cudaEventRecord(stop, 0); 
    cudaEventSynchronize(stop); 
    float elapsedTime; 
    cudaEventElapsedTime(&elapsedTime, start, stop); 
    printf("time required : %f", elapsed_time); 

    cudaEventDestroy(start);
    cudaEventDestroy(stop);
}

Answer 2

不要使用clock來計時CUDA內核的啟動時間。 使用cudaEventElapsedTime 。 即使clock精度足夠高（可以計時），內核啟動也是異步的，這意味着控制流會在內核完成之前返回到您的調用函數。

這是如何做：

void run_kernel(...)
{
  // create "events" which record the start & finish of the kernel of interest
  cudaEvent_t start, end;
  cudaEventCreate(&start);
  cudaEventCreate(&end):

  // record the start of the kernel
  cudaEventRecord(start);

  // perform operation on gpu - call sum
  sum<<<...>>>(...);

  // record the end of the kernel
  cudaEventRecord(end);

  // get elapsed time. Note that this call blocks
  // until the kernel is complete
  float ms;
  cudaEventElapsedTime(&ms, start, end);

  printf("time required : %f milliseconds", ms);

  cudaEventDestroy(start);
  cudaEventDestroy(end);
}

Answer 3

我相信您現在應該將CLOCK_MONOTONIC （）與CLOCK_MONOTONIC一起使用，以測量經過時間達到高分辨率的時間。 在我的計算機上，分辨率為1ns，足夠好了。

你可以像這樣使用它

#include <time.h>
...

struct timespec start, end, res;

clock_getres(CLOCK_MONOTONIC, &res);
/* exact format string depends on your system, on mine time_t is long */
printf("Resolution is %ld s, %ld ns\n" res.tv_sec, res.tv_nsec);

clock_gettime(CLOCK_MONOTONIC, &start);
/* whatever */
clock_gettime(CLOCK_MONOTONIC, &end);

用-lrt編譯

編輯：我看到我對此采取了錯誤的方法，如果您需要的話，顯然您應該使用CUDA時序。 我按照您的問題進行了系統計時。

Answer 4

cuda內核啟動是異步的，因此必須在內核之后添加cudaThreadSynchronize（）。

使用時鍾計算時間得出的值為零-Linux

問題描述

4 個解決方案

解決方案1
7 已采納 2012-04-30 06:19:39

解決方案2
5 2012-04-30 06:17:02

解決方案3
0 2012-04-30 06:11:41

解決方案4
0 2012-04-30 12:56:32

使用時鍾計算時間得出的值為零-Linux

問題描述

4 個解決方案

解決方案1 7 已采納 2012-04-30 06:19:39

解決方案2 5 2012-04-30 06:17:02

解決方案3 0 2012-04-30 06:11:41

解決方案4 0 2012-04-30 12:56:32

解決方案1
7 已采納 2012-04-30 06:19:39

解決方案2
5 2012-04-30 06:17:02

解決方案3
0 2012-04-30 06:11:41

解決方案4
0 2012-04-30 12:56:32