通过引用boost :: compute闭包或函数来传递自定义结构的向量

Question

I'm somewhat new to opencl and am trying to learn to use boost::compute properly. 我是opencl的新手，正在尝试学习正确使用boost :: compute。 Consider the following code: 考虑以下代码：

#include <iostream>
#include <vector>
#include <boost/compute.hpp>

const cl_int cell_U_size{ 4 };

#pragma pack (push,1)
struct Cell
{
    cl_double U[cell_U_size];
};
#pragma pack (pop)

BOOST_COMPUTE_ADAPT_STRUCT(Cell, Cell, (U));

int main(int argc, char* argv[])
{
    using namespace boost;
    auto device = compute::system::default_device();
    auto context = compute::context(device);
    auto queue = compute::command_queue(context, device);

    std::vector<Cell> host_Cells;
    host_Cells.reserve(10);
    for (auto j = 0; j < host_Cells.capacity(); ++j) {
        host_Cells.emplace_back(Cell());
        for (auto i = 0; i < cell_U_size; ++i) {
            host_Cells.back().U[i] = static_cast<cl_double>(i+j);
        }
    }
    std::cout << "Before:\n";
    for (auto const& hc : host_Cells) {
        for (auto const& u : hc.U)
            std::cout << " " << u;
        std::cout << "\n";
    }
    compute::vector<Cell> device_Cells(host_Cells.size(), context);
    auto f = compute::copy_async(host_Cells.begin(), host_Cells.end(), device_Cells.begin(), queue);
    try {
        BOOST_COMPUTE_CLOSURE(Cell, Step1, (Cell cell), (cell_U_size), {
            for (int i = 0; i < cell_U_size; ++i) {
                cell.U[i] += 1.0;
            }
            return cell;
        });
        f.wait(); // Wait for data to finish being copied
        compute::transform(device_Cells.begin(), device_Cells.end(), device_Cells.begin(), Step1, queue);

        //BOOST_COMPUTE_CLOSURE(void, Step2, (Cell &cell), (cell_U_size), {
        //  for (int i = 0; i < cell_U_size; ++i) {
        //      cell.U[i] += 1.0;
        //  }
        //});
        //compute::for_each(device_Cells.begin(), device_Cells.end(), Step2, queue);

        compute::copy(device_Cells.begin(), device_Cells.end(), host_Cells.begin(), queue);
    }
    catch (std::exception &e) {
        std::cout << e.what() << std::endl;
        throw;
    }
    std::cout << "After:\n";
    for (auto const& hc : host_Cells) {
        for (auto const& u : hc.U)
            std::cout << " " << u;
        std::cout << "\n";
    }
}

I have a vector of custom structs (actually much more complicated than shown here) that I want to process on the GPU. 我有一个要在GPU上处理的自定义结构向量（实际上比这里显示的要复杂得多）。 In the uncommented BOOST_COMPUTE_CLOSURE the compute::transform passes the structs by value, processes them and then copies them back. 在未注释的BOOST_COMPUTE_CLOSURE中， compute::transform按值传递结构，对其进行处理，然后将其复制回。

I would like to pass these by reference as shown in the commented out BOOST_COMPUTE_CLOSURE with compute::for_each , but the kernel fails to compile ( Build Program Failure ) when the program is run and I have not found any documentation mentioning how this should be achieved. 我想以引用的方式传递这些内容，如带注释的BOOST_COMPUTE_CLOSURE和compute::for_each ，但是当程序运行时内核无法编译（ Build Program Failure ），并且我还没有找到任何文档说明如何实现。

I know I can achieve passing by reference (pointers actually, since it's C99) by using BOOST_COMPUTE_STRINGIZE_SOURCE and passing a pointer to the entire vector of structs, but I'd like to use the compute::... functions as these seem more elegant. 我知道我可以通过使用BOOST_COMPUTE_STRINGIZE_SOURCE并传递指向整个结构向量的指针来实现按引用传递（实际上是指针，因为它是C99），但是我想使用compute::...函数，因为这些函数看起来更优雅。

Answer 1

If you define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION macro and building OpenCL program fails, the program source and the build log will be written to stdout. 如果您定义BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION宏而构建OpenCL程序失败，则该程序源和构建日志将被写入stdout。

You can't pass by reference in OpenCL C, which you are trying to do in the BOOST_COMPUTE_CLOSURE . 您不能在OpenCL C中按引用传递，而要在BOOST_COMPUTE_CLOSURE中尝试这样做。 I understand that you would like to pass a __global pointer to your closure and modify values of the variable in global memory, not of the local copy of that value. 我知道您想将__global指针传递给您的闭包，并修改全局内存中变量的值，而不是该值的本地副本。 I don't think it's supported in Boost.Compute, because in for_each (and other algorithms) Boost.Compute always passes value to your function/closure. 我认为Boost.Compute不支持它，因为在for_each （和其他算法）中，Boost.Compute始终将值传递给函数/闭包。

Of course you can always implement a workaround - add unary & operator, or implement custom device iterator. 当然，您始终可以实施变通方法-添加一元&运算符，或实施自定义设备迭代器。 However, in presented example it would just decrease performance, because it would lead to non-coalesced memory reads and writes. 但是，在给出的示例中，这只会降低性能，因为这将导致非逐级读取和写入内存。 If you have very array of complex structures (AoS), try to change it structure of arrays (SoA) or/and break your structure. 如果您有非常复杂的数组（AoS），请尝试更改其数组结构（SoA）或/和破坏您的结构。

通过引用boost :: compute闭包或函数来传递自定义结构的向量

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-06-12 22:48:12

通过引用boost :: compute闭包或函数来传递自定义结构的向量

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-06-12 22:48:12

解决方案1
1 已采纳 2017-06-12 22:48:12