简体繁体中英

Getting total execution time of all kernels on a CUDA stream

原文 2022-06-12 18:35:58 4 1 cuda/ cuda-streams/ cub

I know how to time the execution of one CUDA kernel using CUDA events , which is great for simple cases. But in the real world, an algorithm is often made up of a series of kernels ( CUB::DeviceRadixSort algorithms, for example, launch many kernels to get the job done). If you're running your algorithm on a system with a lot of other streams and kernels also in flight, it's not uncommon for the gaps between individual kernel launches to be highly variable based on what other work gets scheduled in-between launches on your stream. If I'm trying to make my algorithm work faster, I don't care so much about how long it spends sitting waiting for resources. I care about the time it spends actually executing.

So the question is, is there some way to do something like the event API and insert a marker in the stream before the first kernel launches, and read it back after your last kernel launches, and have it tell you the actual amount of time spent executing on the stream, rather than the total end-to-end wall-clock time? Maybe something in CUPTI can do this?

1 answers

You can use Nsight Systems or Nsight Compute. ( https://developer.nvidia.com/tools-overview )

In Nsight Systems, you can profile timelines of each stream. Also, you can use Nsight Compute to profile details of each CUDA kernel. I guess Nsight Compute is better because you can inspect various metrics about GPU performances and get hints about the kernel optimization.

parallel execution of CUDA kernels

parallel execution of kernels in cuda

CUDA concurrent kernel execution with multiple kernels per stream

CUDA and Graphics Kernels Order of Execution

Time measuring of multiple CUDA kernels

Cuda Stream Processing for multiple kernels Disambiguation

Cuda profiler says that my two kernels are expensive, however their execution time seems to be small

What are the factors that affect CUDA kernels launch time

Trouble measuring the elapsed time of a CUDA program and CUDA kernels

Order of execution in CUDA or OpenCL kernels - for memory access optimisation

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question parallel execution of CUDA kernels parallel execution of kernels in cuda CUDA concurrent kernel execution with multiple kernels per stream CUDA and Graphics Kernels Order of Execution Time measuring of multiple CUDA kernels Cuda Stream Processing for multiple kernels Disambiguation Cuda profiler says that my two kernels are expensive, however their execution time seems to be small What are the factors that affect CUDA kernels launch time Trouble measuring the elapsed time of a CUDA program and CUDA kernels Order of execution in CUDA or OpenCL kernels - for memory access optimisation

Related Tags

Getting total execution time of all kernels on a CUDA stream

Question

1 answers

solution1 1 2022-06-12 23:15:36

solution1
1 2022-06-12 23:15:36