简体   繁体   English

测量OpenCL应用程序的经过时间

[英]Measuring Elapsed Time for an OpenCL Application

I know this question is asked several times, but in my application its critical to have the time right, so i might want to try again: 我知道这个问题已经被问过几次了,但是在我的应用程序中,正确安排时间至关重要,因此我可能想再试一次:

I calculate the time for a kernel Method like this, first for CPU Clock time with clock_t; 我这样计算内核方法的时间,首先是使用clock_t计算CPU Clock时间;

clock_t start = clock(); // Or std::chrono::system_clock::now() for WALL CLOCK TIME
openCLFunction();
clock_t end = clock; // Or std::chrono::system_clock::now() for WALL CLOCK TIME
double time_elapsed = start-end;

And my openCLFunction(): 和我的openCLFunction():

{
//some OpenCLKernelfunction
clFlush(queue);
clFinish(queue);
}

There is a big different in results between two method, and to be honest i dont know which is right, because they are in miliseconds. 两种方法的结果有很大的不同,老实说,我不知道哪种方法正确,因为它们以毫秒为单位。 Can i trust the CPU clock time on this ? 我可以相信CPU时钟时间吗? Is there a definitive way to measure without concerning about the results ?(Note that I call two functions to finish my kernel function.) 有没有一种确定的方法可以在不考虑结果的情况下进行测量?(请注意,我调用了两个函数来完成我的内核函数。)

You should probably be using Kernel profiling. 您可能应该使用内核配置文件。

cl_command_queue_properties properties[] {CL_QUEUE_PROPERTIES, CL_QUEUE_PROFILING_ENABLE, 0};
cl_command_queue queue = clCreateCommandQueueWithProperties(context, device, properties, &err);

/*Later...*/
cl_event event;
clEnqueueNDRangeKernel(queue, kernel, /*...*/, &event);
clWaitForEvents(1, &event);
cl_ulong start, end;
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, nullptr);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, nullptr);

std::chrono::nanoseconds duration{end - start};

At the end of that code, duration contains the amount of nanoseconds (reported as precisely as the device is capable; note that many devices don't have sub-microsecond precision) that passed between the beginning and end of execution of the kernel. 在该代码的末尾, duration包含了从内核执行开始到结束之间经过的纳秒数(精确报告的设备能力;请注意,许多设备不具有亚微秒精度)。

There are (at least) 3 ways to time OpenCL/CUDA execution: 至少有3种计时OpenCL / CUDA执行的方法:

  1. Use of CPU timers + queue flushing 使用CPU计时器+队列刷新
  2. Use of OpenCL / CUDA events 使用OpenCL / CUDA事件
  3. Use of an external profiler tool (eg whatever AMD offers or nvprof for nVIDIA cards) 使用外部分析器工具(例如,AMD为nVIDIA卡提供的任何产品或nvprof)

Your first example falls in the first category, but - you don't seem seem to be flushing the queues which the OpenCL function uses (I'm assuming that's a function enqueueing a kernel). 您的第一个示例属于第一类,但是-您似乎似乎没有刷新OpenCL函数使用的队列(我假设这是一个使内核排队的函数)。 So - unless the execution is somehow forced to be synchronous, what you would be measuring is the time it takes to enqueue the kernel and do whatever CPU-side work you do before or after that. 因此-除非以某种方式强制执行同步,否则您要衡量的是排队内核并执行在此之前或之后执行的CPU端工作所需的时间。 That could explain the discrepancy with the clFlush/clFinish method. 这可以解释clFlush / clFinish方法的差异。

Another reason for the discrepancy could be setup/tear-down work (eg memory allocation or run-time internal overhead) which your second method times and your first does not. 出现差异的另一个原因可能是设置/拆卸工作(例如,内存分配或运行时内部开销),而第二种方法却没有,而第一种方法却没有。

A final note is that all three methods will produce slightly different results due to either measurement inaccuracy or differences in the overheads required to make use of them. 最后要注意的是,由于测量不准确或使用它们所需的开销不同,所有这三种方法都会产生略有不同的结果。 These differences may not be so slight if your kernels are small, though: In my experience, profiler-provided kernel execution times vs event-measured times, in CUDA and on nVIDIA Maxwell and Pascal cards can differ by dozens of microseconds. 但是,如果您的内核很小,这些差异可能不会那么小:根据我的经验,在CUDA中以及在nVIDIA的Maxwell和Pascal卡上,探查器提供的内核执行时间与事件测量时间的差异可能相差数十微秒。 And the lessons of that fact are: 1. Try measuring on more data when relevant and possible, and normalizing by the amount of data. 这个事实的教训是:1.尝试在相关且可能的情况下对更多数据进行度量,并根据数据量进行归一化。 2. Be consistent in how you measure execution times when making comparisons. 2.在进行比较时,如何衡量执行时间要保持一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM