如何以编程方式确定持久 kernel 的正确启动参数？

Question

What is the correct way to programmatically determine the launch parameters of a persistent kernel?以编程方式确定持久 kernel 的启动参数的正确方法是什么？ All examples I have found use hard coded values.我发现的所有示例都使用硬编码值。

Is the following correct?以下是正确的吗？

cudaDeviceProp props;

cudaGetDeviceProperties(&props, 0);

int blockCount = props.maxBlocksPerMultiProcessor * props.multiProcessorCount;
int blockThreadCount = props.maxThreadsPerMultiProcessor / props.maxBlocksPerMultiProcessor;

//  Gives <<<1312, 96>>> on a RTX 3090
PersistentKernel<<<blockCount, blockThreadCount>>>(...);

Answer 1

Is the following correct?以下是正确的吗？

No.不。

Use cudaOccupancyMaxPotentialBlockSize .使用cudaOccupancyMaxPotentialBlockSize 。 That will give you both the grid size and block size for the current device which maximizes the occupancy of a given kernel with the minimum number of blocks.这将为您提供当前设备的网格大小和块大小，从而以最少的块数最大化给定 kernel 的占用率。 That is the optimal launch parameters for a given persistent kernel.这是给定持久 kernel 的最佳启动参数。

Note that the returned block and grid dimensions are scalars.请注意，返回的块和网格尺寸是标量。 You are free to reshape them into multidimensional dim3 block and/or grid dimensions which preserve the total number of threads per block and blocks which are returned by the API.您可以自由地将它们重塑为多维dim3块和/或网格尺寸，以保留 API 返回的每个块和块的线程总数。

如何以编程方式确定持久 kernel 的正确启动参数？

问题描述

1 个解决方案

解决方案1
1 已采纳

如何以编程方式确定持久 kernel 的正确启动参数？

问题描述

1 个解决方案

解决方案1 1 已采纳

解决方案1
1 已采纳