[英]How to programmatically determine the correct launch parameters for a persistent kernel?
What is the correct way to programmatically determine the launch parameters of a persistent kernel?以编程方式确定持久 kernel 的启动参数的正确方法是什么? All examples I have found use hard coded values.我发现的所有示例都使用硬编码值。
Is the following correct?以下是正确的吗?
cudaDeviceProp props;
cudaGetDeviceProperties(&props, 0);
int blockCount = props.maxBlocksPerMultiProcessor * props.multiProcessorCount;
int blockThreadCount = props.maxThreadsPerMultiProcessor / props.maxBlocksPerMultiProcessor;
// Gives <<<1312, 96>>> on a RTX 3090
PersistentKernel<<<blockCount, blockThreadCount>>>(...);
Is the following correct?以下是正确的吗?
No.不。
Use cudaOccupancyMaxPotentialBlockSize
.使用cudaOccupancyMaxPotentialBlockSize
。 That will give you both the grid size and block size for the current device which maximizes the occupancy of a given kernel with the minimum number of blocks.这将为您提供当前设备的网格大小和块大小,从而以最少的块数最大化给定 kernel 的占用率。 That is the optimal launch parameters for a given persistent kernel.这是给定持久 kernel 的最佳启动参数。
Note that the returned block and grid dimensions are scalars.请注意,返回的块和网格尺寸是标量。 You are free to reshape them into multidimensional dim3
block and/or grid dimensions which preserve the total number of threads per block and blocks which are returned by the API.您可以自由地将它们重塑为多维dim3
块和/或网格尺寸,以保留 API 返回的每个块和块的线程总数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.