简体   繁体   English

在boost :: compute中分配大向量

[英]Allocating large vectors in boost::compute

While experimenting with boost::compute I've run into an issue with determining the largest vector I can allocate on a device (I'm still fairly new to boost::compute). 在尝试boost :: compute时,我遇到了一个问题,即确定可以在设备上分配的最大向量(我对boost :: compute还是很陌生)。 The following snippet of code 以下代码片段

std::vector<cl_double> host_tmp;
std::cout << "CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_GLOBAL_MEM_SIZE) / sizeof(cl_double) << "\n";
std::cout << "CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double) << "\n";
size_t num_elements = device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double);
compute::vector<cl_double> dev_tmp(context);
std::cout << "Maximum size of vector reported by .max_size() = " << dev_tmp.max_size() << "\n";
for (auto i = 0; i < 64; ++i) {
    std::cout << "Resizing device vector to " << num_elements << "...";
    dev_tmp.resize(num_elements, queue);
    std::cout << " done.";
    std::cout << " Assigning host data...";
    host_tmp.resize(num_elements);
    std::iota(host_tmp.begin(), host_tmp.end(), 0);
    std::cout << " done.";
    std::cout << " Copying data from host to device...";
    compute::copy(host_tmp.begin(), host_tmp.end(), dev_tmp.begin(), queue);
    std::cout << " done.\n";
    num_elements += 1024 * 1024;
}

gives

CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = 268435456
CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = 67108864
Maximum size of vector reported by .max_size() = 67108864
Resizing device vector to 67108864... done. Assigning host data... done. Copying data from host to device... done.
Resizing device vector to 68157440... done. Assigning host data... done. Copying data from host to device... done.
...
Resizing device vector to 101711872...Memory Object Allocation Failure

so clearly the reported max_size() is neither a hard limit nor enforced. 因此很明显,报告的max_size()既不是硬性限制也不是强制性的。 I assume that to be safe I should stick to the reported max_size(), however, if I allocate multiple vectors on the device of size max_size(), then I also receive the Memory Object Allocation Failure message. 我假设为了安全起见,我应该坚持报告的max_size(),但是,如果我在max_size()大小的设备上分配了多个向量,那么我还会收到“ Memory Object Allocation Failure消息。

  1. What is the correct/usual way to deal with (and avoid) memory allocation failures when using boost::compute? 使用boost :: compute处理(并避免)内存分配失败的正确/常用方法是什么?
  2. How can I determine the largest size of a vector that I can allocate at any given moment (ie the device may already contain allocated data)? 如何确定在任何给定时刻可以分配的向量的最大大小(即设备可能已经包含分配的数据)?
  3. If I have too much data, can I get boost::compute to automatically process it in chunks or do I have to break it up myself? 如果我有太多数据,我可以得到boost :: compute以自动分块处理它还是我必须自己分解它?
  4. How do I free up memory on the device once I'm done with it? 完成操作后,如何释放设备上的内存?
  1. What is the correct/usual way to deal with (and avoid) memory allocation failures when using boost::compute? 使用boost :: compute处理(并避免)内存分配失败的正确/常用方法是什么?

You just need to follow the same rules as for OpenCL. 您只需要遵循与OpenCL相同的规则。 Boost.Compute does not add any new restrictions. Boost.Compute不添加任何新限制。 You have to remember that on many OpenCL platforms allocation memory for buffer is done in a lazy way, so even if creating buffer of size greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE is successful it can fail later (implementation defined behaviour). 您必须记住,在许多OpenCL平台上,缓冲区分配的内存是通过惰性方式完成的,因此,即使创建大小大于CL_DEVICE_MAX_MEM_ALLOC_SIZE缓冲区成功,它也可能在以后失败(实现定义的行为)。

  1. How can I determine the largest size of a vector that I can allocate at any given moment (ie the device may already contain allocated data)? 如何确定在任何给定时刻可以分配的向量的最大大小(即设备可能已经包含分配的数据)?

I don't think that possible. 我认为那不可能。 You can always create your allocator class (and use it with boost::compute::vector ), that would globally track this per device (using CL_DEVICE_GLOBAL_MEM_SIZE ) and do whatever you want it to do when there's not enough memory. 您始终可以创建分配器类(并将其与boost::compute::vector ),该类将在每个设备上全局跟踪该分配器类(使用CL_DEVICE_GLOBAL_MEM_SIZE ),并在没有足够内存的情况下执行您想做的任何事情。 However, you have to remember that OpenCL memory is bound to a context and not to a device. 但是,您必须记住,OpenCL内存是绑定到上下文而不是设备的。

  1. If I have too much data, can I get boost::compute to automatically process it in chunks or do I have to break it up myself? 如果我有太多数据,我可以得到boost :: compute以自动分块处理它还是我必须自己分解它?

No, you have to implement something that takes care of that. 不,您必须实现一些解决方案。 It can be done in multiple ways depending on your OpenCL platform and supported OpenCL version. 可以通过多种方式完成此操作,具体取决于您的OpenCL平台和支持的OpenCL版本。

  1. How do I free up memory on the device once I'm done with it? 完成操作后,如何释放设备上的内存?

boost::compute::vector 's destructor release device memory. boost::compute::vector的析构函数释放设备内存。 Each OpenCL memory object (like buffer) has its reference counter that is properly increased and decreased by Boost.Compute's classes. 每个OpenCL内存对象(如缓冲区)都有其引用计数器,该计数器由Boost.Compute的类适当地增加和减少。 Note: Iterators do not own buffers, so after underlying buffer is released (for example, after boost::compute::vector that allocated that buffer is destructed), iterators stop working. 注意:迭代器不拥有缓冲区,因此在释放基础缓冲区之后(例如,在分配了该缓冲区的boost::compute::vector被破坏之后),迭代器将停止工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM