简体   繁体   English

如何以编程方式确定持久 kernel 的正确启动参数?

[英]How to programmatically determine the correct launch parameters for a persistent kernel?

What is the correct way to programmatically determine the launch parameters of a persistent kernel?以编程方式确定持久 kernel 的启动参数的正确方法是什么? All examples I have found use hard coded values.我发现的所有示例都使用硬编码值。

Is the following correct?以下是正确的吗?

cudaDeviceProp props;

cudaGetDeviceProperties(&props, 0);

int blockCount = props.maxBlocksPerMultiProcessor * props.multiProcessorCount;
int blockThreadCount = props.maxThreadsPerMultiProcessor / props.maxBlocksPerMultiProcessor;

//  Gives <<<1312, 96>>> on a RTX 3090
PersistentKernel<<<blockCount, blockThreadCount>>>(...);

Is the following correct?以下是正确的吗?

No.不。

Use cudaOccupancyMaxPotentialBlockSize .使用cudaOccupancyMaxPotentialBlockSize That will give you both the grid size and block size for the current device which maximizes the occupancy of a given kernel with the minimum number of blocks.这将为您提供当前设备的网格大小和块大小,从而以最少的块数最大化给定 kernel 的占用率。 That is the optimal launch parameters for a given persistent kernel.这是给定持久 kernel 的最佳启动参数。

Note that the returned block and grid dimensions are scalars.请注意,返回的块和网格尺寸是标量。 You are free to reshape them into multidimensional dim3 block and/or grid dimensions which preserve the total number of threads per block and blocks which are returned by the API.您可以自由地将它们重塑为多维dim3块和/或网格尺寸,以保留 API 返回的每个块和块的线程总数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM