简体   繁体   English

如何使用OpenCL C ++绑定获得最大的全局工作量?

[英]How to get the maximum global work size with OpenCL C++ bindings?

I want to get the maximum global work size. 我想获得最大的全局工作量。 I don't want a kernel OpenCL will try to choose the best one for you, which MAY or MAY NOT be the maximum size. 我不希望内核OpenCL尝试为您选择最好的内核,它可能不是最大大小。

To do this I want to specify the size when call clEnqueueNDRangeKernel . 为此,我想在调用clEnqueueNDRangeKernel时指定大小。 eg: 例如:

clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL);

clGetKernelWorkGroupInfo documentation , indicates : clGetKernelWorkGroupInfo文档 ,指示:

CL_KERNEL_GLOBAL_WORK_SIZE : This provides a mechanism for the application to query the maximum global size that can be used to execute a kernel (ie global_work_size argument to clEnqueueNDRangeKernel) on a custom device given by device or a built-in kernel on an OpenCL device given by device. CL_KERNEL_GLOBAL_WORK_SIZE:这为应用程序提供了一种机制,用于查询可用于在设备给定的自定义设备上执行内核的最大全局大小(即clEnqueueNDRangeKernel的global_work_size参数)或设备给定的OpenCL设备上的内置内核。

How can I get CL_KERNEL_GLOBAL_WORK_SIZE with OpenCL C++ bindings ? 如何使用OpenCL C ++绑定获取CL_KERNEL_GLOBAL_WORK_SIZE

I do this 我做这个

cl::array<size_t, 3> kernel_global_work_size = my_kernel.getWorkGroupInfo<CL_KERNEL_GLOBAL_WORK_SIZE>(my_device);

But I got error : 但是我得到了错误:

cl2.hpp:5771:12: note: candidate: template<class T> cl_int cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_kernel_work_group_info, T*) const
     cl_int getWorkGroupInfo(
            ^~~~~~~~~~~~~~~~
cl2.hpp:5771:12: note:   template argument deduction/substitution failed:
cl2.hpp:5782:9: note: candidate: template<int name> typename cl::detail::param_traits<cl::detail::cl_kernel_work_group_info, name>::param_type cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_int*) const
         getWorkGroupInfo(const Device& device, cl_int* err = NULL) const

And with this code 并用这段代码

cl::array<size_t, 3> kernel_global_work_size;
my_kernel.getWorkGroupInfo<cl::array<size_t, 3>>(my_device, CL_KERNEL_GLOBAL_WORK_SIZE, &kernel_global_work_size);

I got OpenCL error -30 (Invalid Value) 我收到OpenCL错误-30(无效值)

my_kernel is not Built-in Kernel eg: cl::Kernel my_kernel = cl::Kernel(program, "my_kernel"); my_kernel不是内置内核,例如: cl::Kernel my_kernel = cl::Kernel(program, "my_kernel"); my_device is not Custom device. my_device不是自定义设备。 eg: cl::Device device = myDevices[0]; 例如: cl::Device device = myDevices[0];

Yes, as your call matches the signature: 是的,因为您的呼叫与签名匹配:

https://github.khronos.org/OpenCL-CLHPP/classcl_1_1_kernel.html https://github.khronos.org/OpenCL-CLHPP/classcl_1_1_kernel.html

template <cl_int name> typename
detail::param_traits<detail::cl_kernel_work_group_info, name>::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const;

It looks like the param_traits which is generated via Macros is not declared for CL_KERNEL_GLOBAL_WORK_SIZE . 看起来好像没有为CL_KERNEL_GLOBAL_WORK_SIZE声明通过宏生成的param_traits That would be a bug in the headers. 那将是标题中的错误。 ( GitHub issue created by OP ) OP创建的GitHub问题

For some entries here there are missing entries here . 对于某些条目这里有缺少的条目这里

Alternatively, you can use the version returning an error code, and the info via an output parameter, that should work around the issue: 或者,您可以使用返回错误代码的版本,以及通过输出参数获取的信息,以解决该问题:

template<typename T>
cl_int getWorkGroupInfo(const Device &device, cl_kernel_work_group_info name, T *param) const;

Call could look like: 呼叫看起来像:

cl::array<size_t, 3> result;
kernel.getWorkGroupInfo<decltype(result)>(device, CL_KERNEL_GLOBAL_WORK_SIZE, result);

My question to you would be: Did you try it yourself? 我对您的问题是: 您自己尝试过吗? Did the result not match your expectations? 结果是否与您的期望不符?


Did you get an CL_INVALID_VALUE? 您得到CL_INVALID_VALUE吗?

[...] on a custom device given by device or a built-in kernel on an OpenCL device given by device. 在设备提供的自定义设备上或在设备提供的OpenCL设备上的内置内核上的[...]

If device is not a custom device or kernel is not a built-in kernel , clGetKernelArgInfo returns the error CL_INVALID_VALUE. 如果设备不是定制设备,或者内核不是内置内核 ,则clGetKernelArgInfo返回错误CL_INVALID_VALUE。

See OpenCL 1.2 spec , pages 14 and 15: 请参阅第14和15页的OpenCL 1.2规范

Built-in Kernel: A built-in kernel is a kernel that is executed on an OpenCL device or custom device by fixed-function hardware or in firmware. 内置内核:内置内核是通过固定功能硬件或固件在OpenCL设备或自定义设备上执行的内核。 Applications can query the built-in kernels supported by a device or custom device. 应用程序可以查询设备或自定义设备支持的内置内核。 A program object can only contain kernels written in OpenCL C or built-in kernels but not both. 程序对象只能包含用OpenCL C编写的内核或内置内核,而不能同时包含两者。 See also Kernel and Program. 另请参阅内核和程序。

Custom Device: An OpenCL device that fully implements the OpenCL Runtime but does not support programs written in OpenCL C. A custom device may be specialized non- programmable hardware that is very power efficient and performant for directed tasks or hardware with limited programmable capabilities such as specialized DSPs. 自定义设备:完全实现OpenCL运行时但不支持用OpenCL C编写的程序的OpenCL设备。自定义设备可以是专用的非可编程硬件,具有很高的能效并能执行定向任务,或者是可编程功能有限的硬件,例如专用DSP。 Custom devices are not OpenCL conformant. 自定义设备不符合OpenCL。 Custom devices may support an online compiler. 定制设备可能支持在线编译器。 Programs for custom devices can be created using the OpenCL runtime APIs that allow OpenCL programs to be created from source (if an online compiler is supported) and/or binary, or from built-in kernels supported by the device. 可以使用OpenCL运行时API创建自定义设备的程序,该API允许从源代码(如果支持在线编译器)和/或二进制文件或从设备支持的内置内核中创建OpenCL程序。 See also Device. 另请参阅设备。

For regular kernels and devices , the standard constrains the work group size (device property), while the global size is only constrained by the range of the used size_t . 对于常规内核和设备 ,标准限制工作组的大小(设备属性),而全局大小仅受使用的size_t范围的限制 See clEnqueueNDRangeKernel . 请参阅clEnqueueNDRangeKernel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM