[英]Techniques for optimising GPU utilisation processing discrete images
I have a server which is applying filters (implemented as OpenGL shaders) to images.我有一台服务器正在对图像应用过滤器(实现为 OpenGL 着色器)。 They are mostly direct colour mappings but also occasionally blurs and other convolutions.它们大多是直接颜色映射,但偶尔也会出现模糊和其他卷积。
The source images are PNGs and JPGs in a variety of sizes from eg 100x100 pixels upto 16,384x16,384 (texture size for my GPU).源图像是各种尺寸的 PNG 和 JPG,例如从 100x100 像素到 16,384x16,384(我的 GPU 的纹理大小)。
The pipeline is:管道是:
Decode image to RGBA (CPU)
|
V
Load texture to GPU
|
V
Apply shader (GPU)
|
V
Unload to CPU memory
|
V
Encode to PNG (CPU)
The mean GPU timings are approx 0.75ms to load, 1.5ms to unload and 1.5 ms to process a texture.平均 GPU 时间加载大约 0.75 毫秒,卸载大约 1.5 毫秒,处理纹理大约需要 1.5 毫秒。
I have multiple CPU threads decoding PNGs and JPGs to provide a continuous stream of work to the GPU.我有多个 CPU 线程解码 PNG 和 JPG 以向 GPU 提供连续的 stream 工作。
The challenge is that watch -n 0.1 nvidia-smi
reports that the GPU utilisation is largely about 0% - 1%, spiking to 18% periodically.挑战在于watch -n 0.1 nvidia-smi
报告说 GPU 利用率在很大程度上约为 0% - 1%,周期性地飙升至 18%。
I really want to be getting more value out of the GPU, ie I'd like to see it's load at least around 50%.我真的想从 GPU 中获得更多价值,即我希望看到它的负载至少在 50% 左右。 My questions:我的问题:
Is nvidia-smi
giving a reasonable representation of how busy the GPU is? nvidia-smi
是否合理地表示了 GPU 的繁忙程度? Does it for example include time to load and unload textures?例如,它是否包括加载和卸载纹理的时间? If not, is there a better metric I could be using.如果没有,是否有更好的指标我可以使用。
Assuming that it is, and the GPU is sitting back doing nothing, are there any well understood architectures for increasing throughput?假设是这样,并且 GPU 无所事事,是否有任何易于理解的架构来提高吞吐量? I've considered tiling multiple images into a large texture but this feels like it'll blow out CPU usage rather than GPU.我考虑过将多个图像平铺成一个大纹理,但这感觉就像它会破坏 CPU 使用率而不是 GPU。
Is there someway I could be loading the next image to GPU texture memory while the GPU is processing the previous image?有没有办法在 GPU 正在处理上一张图像时将下一张图像加载到 GPU 纹理 memory ?
Sampling nvidia-smi
is a really poor way of figuring out utilization. nvidia-smi
进行抽样是确定利用率的一种非常糟糕的方法。 Use Nvidia Visual Profiler (I find this easiest to work with) or Nvidia Nsight to get a true picture of what your performance and bottlenecks are.使用Nvidia Visual Profiler (我发现这个最容易使用)或Nvidia Nsight来真实了解您的性能和瓶颈。
It's hard to say how to improve performance without seeing your code and without you having a better understanding of what the bottleneck is.很难说如何在没有看到代码并且没有更好地理解瓶颈是什么的情况下提高性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.