简体繁体 English

有没有办法在 OpenCl 中加载大小等于 GPU 的全局内存大小的向量？

[英]Is there a way to load a vector equal by size to global memory size of GPU in OpenCl?

原文 2022-07-05 17:25:29 1 1 opencl/ boost-compute

My GPU has 12 GB global memory (CL_DEVICE_GLOBAL_MEM_SIZE), but only 3 GB of memory which it can allocate (CL_DEVICE_MAX_MEM_ALLOC_SIZE).我的 GPU 有 12 GB 全局内存 (CL_DEVICE_GLOBAL_MEM_SIZE)，但它只能分配 3 GB 内存 (CL_DEVICE_MAX_MEM_ALLOC_SIZE)。 When I try to load a vector of size exceeding 3 GB, the program crashes.当我尝试加载大小超过 3 GB 的向量时，程序崩溃。 The question is, if it is possible to load a bigger vector into GPU memory to utilize it completely, how to do it?问题是，如果可以将更大的向量加载到 GPU 内存中以完全利用它，该怎么做？

1 个解决方案

By default, CL_DEVICE_MAX_MEM_ALLOC_SIZE reports 1/4 of CL_DEVICE_GLOBAL_MEM_SIZE , meaning it would only be allowed to allocate four 3GB buffers on a 12GB GPU.默认情况下， CL_DEVICE_MAX_MEM_ALLOC_SIZE报告CL_DEVICE_GLOBAL_MEM_SIZE的 1/4，这意味着它只允许在 12GB GPU 上分配四个 3GB 缓冲区。

However, Nvidia GPUs allow to allocate their full memory capacity in a single buffer, even though they also report to have the 1/4 limit.但是，Nvidia GPU 允许在单个缓冲区中分配其全部内存容量，即使它们也报告有 1/4 的限制。

Some AMD GPUs have the limit set higher, for example the Radeon VII lets you use 14/16GB for a single buffer.一些 AMD GPU 的限制设置更高，例如 Radeon VII 允许您将 14/16GB 用于单个缓冲区。

The only devices I have ever seen that really inforce the 1/4 limit are Intel HD 4600 and 5500, so older Intel integrated GPUs.我见过的唯一真正实施 1/4 限制的设备是 Intel HD 4600 和 5500，因此较旧的 Intel 集成 GPU。 If you go above 1/4 in buffer size there, the cl::Buffer constructor throws error -61 .如果那里的缓冲区大小超过 1/4，则cl::Buffer构造函数会抛出错误-61 。

In case you are stuck with the 1/4 memory limit on your device, split your large 12GB buffer in 4 smaller 3GB buffers (for example one vector for x, y, z, w components of the vector each).如果您遇到设备上的 1/4 内存限制，请将您的 12GB 大缓冲区拆分为 4 个较小的 3GB 缓冲区（例如，一个向量分别代表向量的 x、y、z、w 分量）。 If you use Windows, note that you might only be able to use ~11.5GB in total as some VRAM is reserved for the operating system.如果您使用 Windows，请注意您总共只能使用约 11.5GB，因为为操作系统保留了一些 VRAM。

I think your issue might not be CL_DEVICE_MAX_MEM_ALLOC_SIZE though, but 32-bit integer overflow for the array size above 4GB.我认为您的问题可能不是CL_DEVICE_MAX_MEM_ALLOC_SIZE ，而是超过 4GB 的数组大小的 32 位整数溢出。 Use the uint64_t data type to set the array size instead.请改用uint64_t数据类型来设置数组大小。

You might also be interested in this lightweight OpenCL-Wrapper for C++.您可能还对这个用于 C++ 的轻量级OpenCL-Wrapper感兴趣。 There, the length of vectors always is in 64-bit integer, and it automatically keeps track on howm much memory you use in total on each device, telling you if you allocate too much.在那里，向量的长度始终为 64 位整数，它会自动跟踪您在每个设备上总共使用了多少内存，告诉您是否分配了太多。 It also catches that -61 error on Intel iGPUs and tells you the maximum allowed buffer size then.它还捕捉到英特尔 iGPU 上的-61错误，然后告诉您允许的最大缓冲区大小。