简体   繁体   中英

Determine max global work group size based on device memory in OpenCL?

I am able to list the following parameters which help in restricting the work items for a device based on the device memory:

  • CL_DEVICE_GLOBAL_MEM_SIZE
  • CL_DEVICE_LOCAL_MEM_SIZE
  • CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE
  • CL_DEVICE_MAX_MEM_ALLOC_SIZE
  • CL_DEVICE_MAX_WORK_GROUP_SIZE
  • CL_DEVICE_MAX_WORK_ITEM_SIZES
  • CL_KERNEL_WORK_GROUP_SIZE

I find the explanation for these parameters insufficient and hence I am not able to use these parameters properly. Can somebody please tell me what these parameters mean and how they are used. Is it necessary to check all these parameters?

PS: I have some brief understanding of some of the parameters but I am not sure whether my understanding is correct.

CL_DEVICE_GLOBAL_MEM_SIZE:

  • Global memory amount of the device. You typically don't care, unless you use high amount of data. Anyway the OpenCL spec will complain about OUT_OF_RESOURCES error if you use more than allowed. (bytes)

CL_DEVICE_LOCAL_MEM_SIZE:

  • Amount of local memory for each workgroup. However, this limit is just under ideal conditions. If your kernel uses high amount of WI per WG maybe some of the private WI data is being spilled out to local memory. So take it as a maximum available amount per WG.

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:

  • The maximum amount of constant memory that can be used for a single kernel. If you use constant buffers that all together have more than this amount, either it will fail, or use global normal memory instead (it may therefore be slower). (bytes)

CL_DEVICE_MAX_MEM_ALLOC_SIZE:

  • The maximum amount of memory in 1 single piece you can allocate in a device. (bytes)

CL_DEVICE_MAX_WORK_GROUP_SIZE:

  • Maximum work group size of the device. This is the ideal maximum. Depending on the kernel code the limit may be lower.

CL_DEVICE_MAX_WORK_ITEM_SIZES:

  • The maximum amount of work items per dimension. IE: The device may have 1024 WI as maximum size and 3 maximum dimensions. But you may not be able to use (1024,1,1) as size, since it may be limited to (64,64,64), so, you can only do (64,2,8) for example.

CL_KERNEL_WORK_GROUP_SIZE:

  • The default kernel size given by the implementation. It may be forced to be higher, or lower, but the value already provided should be a good one already (good tradeoff of GPU usage %, memory spill off, etc).

NOTE: All this data is the theoretical limits. But if your kernel uses a resource more than other, ie: local memory depending on the size of the work group, you may not be able to reach the maximum work items per work group, since it is possible you reach first the local memory limit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM