简体繁体中英

Concurrent GPU kernel execution from multiple processes

原文 2012-10-01 19:29:29 6 2 cuda/ opencl/ gpu/ nvidia/ amd-processor

I have an application in which I would like to share a single GPU between multiple processes. That is, each of these processes would create its own CUDA or OpenCL context, targeting the same GPU. According to the Fermi white paper[1], application-level context switching is less then 25 microseconds, but the launches are effectively serialized as they launch on the GPU -- so Fermi wouldn't work well for this. According to the Kepler white paper[2], there is something called Hyper-Q that allows for up to 32 simultaneous connections from multiple CUDA streams, MPI processes, or threads within a process.

My questions: Has anyone tried this on a Kepler GPU and verified that its kernels are run concurrently when scheduled from distinct processes? Is this just a CUDA feature, or can it also be used with OpenCL on Nvidia GPUs? Do AMD's GPUs support something similar?

[1] http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

[2] http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

2 answers

In response to the first question, NVIDIA has published some hyper-Q results in a blog here . The blog is pointing out that the developers who were porting CP2K were able to get to accelerated results more quickly because hyper-Q allowed them to use the application's MPI structure more or less as-is and run multiple ranks on a single GPU, and get higher effective GPU utilization that way. As mentioned in the comments, this (hyper-Q) feature is only available on K20 processors currently, as it is dependent on the GK110 GPU.

I've run simultaneous kernels from Fermi architecture it works wonderfully and in fact, is often the only way to get high occupancy from your hardware. I used OpenCL and you need to run a separate command queue from a separate cpu thread in order to do this. Hyper-Q is the ability to dispatch new data parallel kernels from within another kernel. This is only on Kepler.

concurrent kernel execution

CUDA concurrent kernel execution with multiple kernels per stream

cuda understanding concurrent kernel execution

Priority of concurrent CUDA kernel execution

Cuda optimization, multiprocessors, concurrent kernel execution

CUDA concurrent kernel execution behavior and efficency

multiple processes of video processing to GPU cores

Concurrent execution of two processes sharing a Tesla K20

Overlap kernel execution on multiple streams

Replicate in space an object with position and orientation from GPU kernel

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question concurrent kernel execution CUDA concurrent kernel execution with multiple kernels per stream cuda understanding concurrent kernel execution Priority of concurrent CUDA kernel execution Cuda optimization, multiprocessors, concurrent kernel execution CUDA concurrent kernel execution behavior and efficency multiple processes of video processing to GPU cores Concurrent execution of two processes sharing a Tesla K20 Overlap kernel execution on multiple streams Replicate in space an object with position and orientation from GPU kernel

Related Tags

Concurrent GPU kernel execution from multiple processes

Question

2 answers

solution1
7 ACCPTED 2012-10-05 13:51:40

solution2
-2 2013-05-29 08:24:18

Concurrent GPU kernel execution from multiple processes

Question

2 answers

solution1 7 ACCPTED 2012-10-05 13:51:40

solution2 -2 2013-05-29 08:24:18

solution1
7 ACCPTED 2012-10-05 13:51:40

solution2
-2 2013-05-29 08:24:18