[英]OpenCl equivalent of finding Consecutive indices in CUDA
In CUDA to cover multiple blocks, and thus incerase the range of indices for arrays we do some thing like this: 在CUDA中,它涵盖了多个块,因此可以确定数组的索引范围,我们可以这样做:
Host side Code: 主机端代码:
dim3 dimgrid(9,1)// total 9 blocks will be launched
dim3 dimBlock(16,1)// each block is having 16 threads // total no. of threads in
// the grid is thus 16 x9= 144.
Device side code 设备端代码
...
...
idx=blockIdx.x*blockDim.x+threadIdx.x;// idx will range from 0 to 143
a[idx]=a[idx]*a[idx];
...
...
What is the equivalent in OpenCL for acheiving the above case ? OpenCL中实现上述情况的等效条件是什么?
On the host, when you enqueue your kernel using clEnqueueNDRangeKernel
, you have to specify the global and local work size. 在主机上,使用
clEnqueueNDRangeKernel
将内核加入clEnqueueNDRangeKernel
,必须指定全局和本地工作大小。 For instance: 例如:
size_t global_work_size[1] = { 144 }; // 16 * 9 == 144
size_t local_work_size[1] = { 16 };
clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL,
global_work_size, local_work_size,
0, NULL, NULL);
In your kernel, use: 在您的内核中,使用:
size_t get_global_size(uint dim);
size_t get_global_id(uint dim);
size_t get_local_size(uint dim);
size_t get_local_id(uint dim);
to retrieve the global and local work sizes and indices respectively, where dim
is 0
for x
, 1
for y
and 2
for z
. 分别检索全局和局部工作量和索引,其中
dim
对于x
是0
,对于y
是1
,对于z
2
。
The equivalent of your idx
will thus be simply size_t idx = get_global_id(0);
因此,您的
idx
的等效项就是size_t idx = get_global_id(0);
See the OpenCL Reference Pages . 请参阅OpenCL参考页 。
Equivalences between CUDA and OpenCL are: CUDA和OpenCL之间的等效项是:
blockIdx.x*blockDim.x+threadIdx.x = get_global_id(0)
LocalSize = blockDim.x
GlobalSize = blockDim.x * gridDim.x
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.