计时CUDA操作

Question

I need to time a CUDA kernel execution. 我需要定时执行CUDA内核。 The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. 最佳实践指南说，我们可以在Windows中使用事件或标准计时功能，例如clock() 。 My problem is that using these two functions gives me a totally different result. 我的问题是使用这两个函数会给我完全不同的结果。 In fact, the result given by events seems to be huge compared to the actual speed in practice. 实际上，与实际速度相比，事件给出的结果似乎是巨大的。

What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. 我真正需要的是能够通过首先在较小的数据集上运行简化版本来预测计算的运行时间。 Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic ( clock() ) or waaaay too pessimistic (events). 不幸的是，该基准测试的结果是完全不现实的，要么过于乐观（ clock() ），要么过于悲观（事件）。

Answer 1

You could do something along the lines of : 您可以按照以下方式进行操作：

#include <sys/time.h>

struct timeval t1, t2;

gettimeofday(&t1, 0);

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR(cudaThreadSynchronize();)

gettimeofday(&t2, 0);

double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000.0;

printf("Time to generate:  %3.1f ms \n", time);

or: 要么：

float time;
cudaEvent_t start, stop;

HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
HANDLE_ERROR( cudaEventRecord(start, 0) );

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) );

printf("Time to generate:  %3.1f ms \n", time);

Answer 2

A satisfactory answer has been already given to your question. 您的问题已经给您满意的答复。

I have constructed classes for timing C/C++ as well as CUDA operations and want to share with other hoping they could be helpful to next users. 我已经构造了用于计时C / C ++和CUDA操作的类，并希望与其他人分享，希望它们对下一个用户有所帮助。 You will just need to add the 4 files reported below to your project and #include the two header files as 您只需要将下面报告的4文件添加到您的项目中，并以#include两个头文件作为

// --- Timing includes
#include "TimingCPU.h"
#include "TimingGPU.cuh"

The two classes can be used as follows. 这两个类可以如下使用。

Timing CPU section 计时CPU部分

TimingCPU timer_CPU;

timer_CPU.StartCounter();
CPU perations to be timed
std::cout << "CPU Timing = " << timer_CPU.GetCounter() << " ms" << std::endl;

Timing GPU section 时序GPU部分

TimingGPU timer_GPU;
timer_GPU.StartCounter();
GPU perations to be timed
std::cout << "GPU Timing = " << timer_GPU.GetCounter() << " ms" << std::endl;

In both the cases, the timing is in milliseconds. 在这两种情况下，时间单位均为毫秒。 Also, the two classes can be used under linux or windows. 另外，这两个类可以在linux或Windows下使用。

Here are the 4 files: 这是4文件：

TimingCPU.cpp TimingCPU.cpp

/**************/
/* TIMING CPU */
/**************/

#include "TimingCPU.h"

#ifdef __linux__

    #include <sys/time.h>
    #include <stdio.h>

    TimingCPU::TimingCPU(): cur_time_(0) { StartCounter(); }

    TimingCPU::~TimingCPU() { }

    void TimingCPU::StartCounter()
    {
        struct timeval time;
        if(gettimeofday( &time, 0 )) return;
        cur_time_ = 1000000 * time.tv_sec + time.tv_usec;
    }

    double TimingCPU::GetCounter()
    {
        struct timeval time;
        if(gettimeofday( &time, 0 )) return -1;

        long cur_time = 1000000 * time.tv_sec + time.tv_usec;
        double sec = (cur_time - cur_time_) / 1000000.0;
        if(sec < 0) sec += 86400;
        cur_time_ = cur_time;

        return 1000.*sec;
    }

#elif _WIN32 || _WIN64
    #include <windows.h>
    #include <iostream>

    struct PrivateTimingCPU {
        double  PCFreq;
        __int64 CounterStart;
    };

    // --- Default constructor
    TimingCPU::TimingCPU() { privateTimingCPU = new PrivateTimingCPU; (*privateTimingCPU).PCFreq = 0.0; (*privateTimingCPU).CounterStart = 0; }

    // --- Default destructor
    TimingCPU::~TimingCPU() { }

    // --- Starts the timing
    void TimingCPU::StartCounter()
    {
        LARGE_INTEGER li;
        if(!QueryPerformanceFrequency(&li)) std::cout << "QueryPerformanceFrequency failed!\n";

        (*privateTimingCPU).PCFreq = double(li.QuadPart)/1000.0;

        QueryPerformanceCounter(&li);
        (*privateTimingCPU).CounterStart = li.QuadPart;
    }

    // --- Gets the timing counter in ms
    double TimingCPU::GetCounter()
    {
        LARGE_INTEGER li;
        QueryPerformanceCounter(&li);
        return double(li.QuadPart-(*privateTimingCPU).CounterStart)/(*privateTimingCPU).PCFreq;
    }
#endif

TimingCPU.h TimingCPU.h

// 1 micro-second accuracy
// Returns the time in seconds

#ifndef __TIMINGCPU_H__
#define __TIMINGCPU_H__

#ifdef __linux__

    class TimingCPU {

        private:
            long cur_time_;

        public:

            TimingCPU();

            ~TimingCPU();

            void StartCounter();

            double GetCounter();
    };

#elif _WIN32 || _WIN64

    struct PrivateTimingCPU;

    class TimingCPU
    {
        private:
            PrivateTimingCPU *privateTimingCPU;

        public:

            TimingCPU();

            ~TimingCPU();

            void StartCounter();

            double GetCounter();

    }; // TimingCPU class

#endif

#endif

TimingGPU.cu TimingGPU.cu

/**************/
/* TIMING GPU */
/**************/

#include "TimingGPU.cuh"

#include <cuda.h>
#include <cuda_runtime.h>

struct PrivateTimingGPU {
    cudaEvent_t     start;
    cudaEvent_t     stop;
};

// default constructor
TimingGPU::TimingGPU() { privateTimingGPU = new PrivateTimingGPU; }

// default destructor
TimingGPU::~TimingGPU() { }

void TimingGPU::StartCounter()
{
    cudaEventCreate(&((*privateTimingGPU).start));
    cudaEventCreate(&((*privateTimingGPU).stop));
    cudaEventRecord((*privateTimingGPU).start,0);
}

void TimingGPU::StartCounterFlags()
{
    int eventflags = cudaEventBlockingSync;

    cudaEventCreateWithFlags(&((*privateTimingGPU).start),eventflags);
    cudaEventCreateWithFlags(&((*privateTimingGPU).stop),eventflags);
    cudaEventRecord((*privateTimingGPU).start,0);
}

// Gets the counter in ms
float TimingGPU::GetCounter()
{
    float   time;
    cudaEventRecord((*privateTimingGPU).stop, 0);
    cudaEventSynchronize((*privateTimingGPU).stop);
    cudaEventElapsedTime(&time,(*privateTimingGPU).start,(*privateTimingGPU).stop);
    return time;
}

TimingGPU.cuh TimingGPU.cuh

#ifndef __TIMING_CUH__
#define __TIMING_CUH__

/**************/
/* TIMING GPU */
/**************/

// Events are a part of CUDA API and provide a system independent way to measure execution times on CUDA devices with approximately 0.5
// microsecond precision.

struct PrivateTimingGPU;

class TimingGPU
{
    private:
        PrivateTimingGPU *privateTimingGPU;

    public:

        TimingGPU();

        ~TimingGPU();

        void StartCounter();
        void StartCounterFlags();

        float GetCounter();

}; // TimingCPU class

#endif

Answer 3

There is an out-of-box GpuTimer struct for use: 有一个现成的GpuTimer结构供使用：

#ifndef __GPU_TIMER_H__
#define __GPU_TIMER_H__

struct GpuTimer
{
      cudaEvent_t start;
      cudaEvent_t stop;

      GpuTimer()
      {
            cudaEventCreate(&start);
            cudaEventCreate(&stop);
      }

      ~GpuTimer()
      {
            cudaEventDestroy(start);
            cudaEventDestroy(stop);
      }

      void Start()
      {
            cudaEventRecord(start, 0);
      }

      void Stop()
      {
            cudaEventRecord(stop, 0);
      }

      float Elapsed()
      {
            float elapsed;
            cudaEventSynchronize(stop);
            cudaEventElapsedTime(&elapsed, start, stop);
            return elapsed;
      }
};

#endif  /* __GPU_TIMER_H__ */

Answer 4

If you want to measure GPU time you pretty much have to use events. 如果要测量GPU时间，则几乎必须使用事件。 Theres a great discussion thread on the do's and don'ts of timing your application over on the nvidia forums here . 在此处的nvidia论坛上，有关于如何做和不应该进行应用程序计时的精彩讨论线程。

Answer 5

You can use the compute visula profiler which will be great for your purpose. 您可以使用计算可视化探查器，这对于您的目的将非常有用。 it measures the time of every cuda function and tells you how many times you called it . 它测量每个cuda函数的时间，并告诉您调用该函数的次数。

计时CUDA操作

问题描述

5 个解决方案

解决方案1
23 已采纳 2011-10-24 20:07:41

解决方案2
6 2014-11-19 16:20:21

解决方案3
6 2016-12-15 01:05:33

解决方案4
2 2011-10-24 13:57:12

解决方案5
0 2011-10-24 17:01:35

计时CUDA操作

问题描述

5 个解决方案

解决方案1 23 已采纳 2011-10-24 20:07:41

解决方案2 6 2014-11-19 16:20:21

解决方案3 6 2016-12-15 01:05:33

解决方案4 2 2011-10-24 13:57:12

解决方案5 0 2011-10-24 17:01:35

解决方案1
23 已采纳 2011-10-24 20:07:41

解决方案2
6 2014-11-19 16:20:21

解决方案3
6 2016-12-15 01:05:33

解决方案4
2 2011-10-24 13:57:12

解决方案5
0 2011-10-24 17:01:35