[英]Understanding CL_DEVICE_MAX_WORK_GROUP_SIZE limit OpenCL?
I have little bit difficulty understanding max work group limit reported by OpenCL and how it affects the program.我很难理解 OpenCL 报告的最大工作组限制以及它如何影响程序。
So my program is reporting following thing,所以我的程序正在报告以下事情,
CL_DEVICE_MAX_WORK_ITEM_SIZES : 1024, 1024, 1024
CL_DEVICE_MAX_WORK_GROUP_SIZE : 256
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : 3
Now I am writing program to add vectors with 1 million entries.现在我正在编写程序来添加具有 100 万个条目的向量。 So the calculation for globalSize and localSize for NDRange is as follows
所以 NDRange 的 globalSize 和 localSize 的计算如下
int localSize = 64;
// Number of total work items - localSize must be devisor
globalSize = ceil(n/(float)localSize)*localSize;
.......
// Execute the kernel over the entire range of the data set
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &globalSize, &localSize,
0, NULL, NULL);
Here as per my understanding OpenCL indirectly calculates the number of work groups it will launch.在这里,根据我的理解,OpenCL 间接计算了它将启动的工作组的数量。 For above example
对于上面的例子
globalSize = 15625 * 64 -> 1,000,000 -> So this is total number of threads that will be launched
localSize = 64 -> So each work group will have 64 work items
Hence from above we get因此从上面我们得到
Total Work Groups Launched = globalSize/ localSize -> 15625 Work Groups
Here my confusion starts, If you see value reported by OpenCL CL_DEVICE_MAX_WORK_GROUP_SIZE: 256 So, I was thinking this means max my device can launch 256 work groups in one dimension,我的困惑从这里开始,如果您看到 OpenCL 报告的值 CL_DEVICE_MAX_WORK_GROUP_SIZE: 256 所以,我想这意味着我的设备最多可以在一维中启动 256 个工作组,
but above calculations showed that I am launching 15625 work groups.但上面的计算表明我正在启动 15625 个工作组。
So how is this thing working?那么这个东西是如何工作的呢?
I hope some one can clarify my confusion.我希望有人能澄清我的困惑。 I am sure I am understanding something wrong.
我确定我理解错了。
Thanks in advance.提前致谢。
According to the specification of clEnqueueNDRangeKernel
: https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueNDRangeKernel.html , CL_DEVICE_MAX_WORK_ITEM_SIZES
and CL_DEVICE_MAX_WORK_GROUP_SIZE
indicate the limits of local size ( CL_KERNEL_WORK_GROUP_SIZE
is CL_DEVICE_MAX_WORK_GROUP_SIZE
in OpenCL 1.2). According to the specification of
clEnqueueNDRangeKernel
: https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueNDRangeKernel.html , CL_DEVICE_MAX_WORK_ITEM_SIZES
and CL_DEVICE_MAX_WORK_GROUP_SIZE
indicate the limits of local size ( CL_KERNEL_WORK_GROUP_SIZE
is CL_DEVICE_MAX_WORK_GROUP_SIZE
in OpenCL 1.2)。
const int dimension = n;
const int localSizeDim[n] = { ... }; // Each element must be less than or equal to 'CL_DEVICE_MAX_WORK_ITEM_SIZES[i]'
const int localSize = localSizeDim[0] * localSizeDim[1] * ... * localSizeDim[n-1]; // The size must be less than or equal to 'CL_DEVICE_MAX_WORK_GROUP_SIZ'
I couldn't find the device limit of global work items, but maximum value representable by size t
is the limit of global work items in the description of the error CL_INVALID_GLOBAL_WORK_SIZE
.我找不到全局工作项的设备限制,但
size t
可表示的最大值是错误CL_INVALID_GLOBAL_WORK_SIZE
描述中全局工作项的限制。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.