简体   繁体   English

了解 CL_DEVICE_MAX_WORK_GROUP_SIZE 限制 OpenCL?

[英]Understanding CL_DEVICE_MAX_WORK_GROUP_SIZE limit OpenCL?

I have little bit difficulty understanding max work group limit reported by OpenCL and how it affects the program.我很难理解 OpenCL 报告的最大工作组限制以及它如何影响程序。

So my program is reporting following thing,所以我的程序正在报告以下事情,

   CL_DEVICE_MAX_WORK_ITEM_SIZES  : 1024, 1024, 1024
   CL_DEVICE_MAX_WORK_GROUP_SIZE  : 256
   CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : 3

Now I am writing program to add vectors with 1 million entries.现在我正在编写程序来添加具有 100 万个条目的向量。 So the calculation for globalSize and localSize for NDRange is as follows所以 NDRange 的 globalSize 和 localSize 的计算如下

   int localSize = 64;
   // Number of total work items - localSize must be devisor
   globalSize = ceil(n/(float)localSize)*localSize;

 .......

    // Execute the kernel over the entire range of the data set 
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &globalSize, &localSize,
                                                              0, NULL, NULL);

Here as per my understanding OpenCL indirectly calculates the number of work groups it will launch.在这里,根据我的理解,OpenCL 间接计算了它将启动的工作组的数量。 For above example对于上面的例子

globalSize = 15625 * 64 -> 1,000,000 -> So this is total number of threads that will be launched 
localSize = 64 -> So each work group will have 64 work items 

Hence from above we get因此从上面我们得到

Total Work Groups Launched = globalSize/ localSize -> 15625 Work Groups 

Here my confusion starts, If you see value reported by OpenCL CL_DEVICE_MAX_WORK_GROUP_SIZE: 256 So, I was thinking this means max my device can launch 256 work groups in one dimension,我的困惑从这里开始,如果您看到 OpenCL 报告的值 CL_DEVICE_MAX_WORK_GROUP_SIZE: 256 所以,我想这意味着我的设备最多可以在一维中启动 256 个工作组,

but above calculations showed that I am launching 15625 work groups.但上面的计算表明我正在启动 15625 个工作组。

So how is this thing working?那么这个东西是如何工作的呢?

I hope some one can clarify my confusion.我希望有人能澄清我的困惑。 I am sure I am understanding something wrong.我确定我理解错了。

Thanks in advance.提前致谢。

According to the specification of clEnqueueNDRangeKernel : https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueNDRangeKernel.html , CL_DEVICE_MAX_WORK_ITEM_SIZES and CL_DEVICE_MAX_WORK_GROUP_SIZE indicate the limits of local size ( CL_KERNEL_WORK_GROUP_SIZE is CL_DEVICE_MAX_WORK_GROUP_SIZE in OpenCL 1.2). According to the specification of clEnqueueNDRangeKernel : https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueNDRangeKernel.html , CL_DEVICE_MAX_WORK_ITEM_SIZES and CL_DEVICE_MAX_WORK_GROUP_SIZE indicate the limits of local size ( CL_KERNEL_WORK_GROUP_SIZE is CL_DEVICE_MAX_WORK_GROUP_SIZE in OpenCL 1.2)。

const int dimension = n;
const int localSizeDim[n] = { ... }; // Each element must be less than or equal to 'CL_DEVICE_MAX_WORK_ITEM_SIZES[i]'
const int localSize = localSizeDim[0] * localSizeDim[1] * ... * localSizeDim[n-1]; // The size must be less than or equal to 'CL_DEVICE_MAX_WORK_GROUP_SIZ'

I couldn't find the device limit of global work items, but maximum value representable by size t is the limit of global work items in the description of the error CL_INVALID_GLOBAL_WORK_SIZE .我找不到全局工作项的设备限制,但size t可表示的最大值是错误CL_INVALID_GLOBAL_WORK_SIZE描述中全局工作项的限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM