如何获得 cuda 设备中的核心数？

Question

我正在寻找一个计算我的 cuda 设备核心数的函数。 我知道每个微处理器都有特定的内核，而我的 cuda 设备有 2 个微处理器。

我进行了大量搜索以找到一个计算每个微处理器内核数的属性函数，但我找不到。 我使用下面的代码，但我仍然需要内核数？

CUDA 7.0
程序语言 C
视觉工作室 2013

代码：

void printDevProp(cudaDeviceProp devProp)
{   printf("%s\n", devProp.name);
printf("Major revision number:         %d\n", devProp.major);
printf("Minor revision number:         %d\n", devProp.minor);
printf("Total global memory:           %u", devProp.totalGlobalMem);
printf(" bytes\n");
printf("Number of multiprocessors:     %d\n", devProp.multiProcessorCount);
printf("Total amount of shared memory per block: %u\n",devProp.sharedMemPerBlock);
printf("Total registers per block:     %d\n", devProp.regsPerBlock);
printf("Warp size:                     %d\n", devProp.warpSize);
printf("Maximum memory pitch:          %u\n", devProp.memPitch);
printf("Total amount of constant memory:         %u\n",   devProp.totalConstMem);
return;
}

Answer 1

每个多处理器的核心数是唯一“缺失”的数据。 该数据不直接在cudaDeviceProp结构中提供，但可以根据已发布的数据以及来自devProp.major和devProp.minor条目的更多已发布数据进行推断，它们共同构成了设备的 CUDA计算能力。

这样的事情应该工作：

#include "cuda_runtime_api.h"
// you must first call the cudaGetDeviceProperties() function, then pass 
// the devProp structure returned to this function:
int getSPcores(cudaDeviceProp devProp)
{  
    int cores = 0;
    int mp = devProp.multiProcessorCount;
    switch (devProp.major){
     case 2: // Fermi
      if (devProp.minor == 1) cores = mp * 48;
      else cores = mp * 32;
      break;
     case 3: // Kepler
      cores = mp * 192;
      break;
     case 5: // Maxwell
      cores = mp * 128;
      break;
     case 6: // Pascal
      if ((devProp.minor == 1) || (devProp.minor == 2)) cores = mp * 128;
      else if (devProp.minor == 0) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 7: // Volta and Turing
      if ((devProp.minor == 0) || (devProp.minor == 5)) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 8: // Ampere
      if (devProp.minor == 0) cores = mp * 64;
      else if (devProp.minor == 6) cores = mp * 128;
      else printf("Unknown device type\n");
      break;
     default:
      printf("Unknown device type\n"); 
      break;
      }
    return cores;
}

（在浏览器中编码）

“核心”是一个营销术语。 在我看来，最常见的含义是将其与 SM 中的 SP 单位等同起来。 这就是我在这里展示的意思。 我还省略了 cc 1.x 设备，因为 CUDA 7.0 和 CUDA 7.5 不再支持这些设备类型

一个pythonic版本在这里

Answer 2

在 linux 中，您可以运行以下命令来获取 CUDA 核心数：

nvidia-settings -q CUDACores -t

要在 C 中获取此命令的输出，请使用popen函数。

Answer 3

正如 Vraj Pandya 已经说过的，在 nvidia 的 cuda-samples github 存储库的 Common/helper_cuda.h 文件中有一个函数 ( _ConvertSMVer2Cores )，它提供了这个功能。 您只需要将其结果与来自 GPU 的多处理器数量相乘。

只是想提供一个当前链接。

#include <cuda.h>
#include <cuda_runtime.h>
#include <helper_cuda.h> // You need to place this file somewhere where it can be
                         // found by the linker. 
                         // The file itself seems to also require the 
                         // `helper_string.h` file (in the same folder as 
                         // `helper_cuda.h`).

int deviceID;
cudaDeviceProp props;

cudaGetDevice(&deviceID);
cudaGetDeviceProperties(&props, deviceID);
    
int CUDACores = _ConvertSMVer2Cores(props.major, props.minor) * props.multiProcessorCount;

Answer 4

也许这可能会有所帮助。

https://devtalk.nvidia.com/default/topic/470848/cuda-programming-and-performance/what-39-s-the-proper-way-to-detect-sp-cuda-cores-count-per- sm-/post/4414371/#4414371

“有一个库 helper_cuda.h，其中包含一个例程 _ConvertSMVer2Cores(int major, int minor)，它获取 GPU 的计算能力级别并返回每个 SM 或 SMX 中的内核（流处理器）数量”-来自帖子。

如何获得 cuda 设备中的核心数？

问题描述

4 个解决方案

解决方案1
24 已采纳 2015-09-11 20:54:22

解决方案2
15 2018-12-09 13:11:24

解决方案3
1 2021-02-25 11:04:49

解决方案4
0 2017-10-26 22:57:51

如何获得 cuda 设备中的核心数？

问题描述

4 个解决方案

解决方案1 24 已采纳 2015-09-11 20:54:22

解决方案2 15 2018-12-09 13:11:24

解决方案3 1 2021-02-25 11:04:49

解决方案4 0 2017-10-26 22:57:51

解决方案1
24 已采纳 2015-09-11 20:54:22

解决方案2
15 2018-12-09 13:11:24

解决方案3
1 2021-02-25 11:04:49

解决方案4
0 2017-10-26 22:57:51