OpenCL，多个工作组/内核？

Question

I have some code I made in C++ that takes advantage of multiple threads. 我有一些用C ++编写的代码，它们利用了多个线程。

I did away with an array, and can sum up the program as such (running on multiple threads over multiple runs) ie a summation of -1/+1 random numbers 我删除了一个数组，可以像这样总结程序（在多个线程上多次运行），即-1 / + 1随机数的总和

runningTotal += ((rng_1.rand_cmwc()%range + 1) <= halfRange ? 1: -1);

rng_1.rand_cmwc() refers to a function of cmwc class, rng_1 is the object. rng_1.rand_cmwc（）引用cmwc类的函数，rng_1是对象。

I've done some reading on OpenCl (http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201) , I have the library setup, and compiled my own host. 我已经在OpenCl（http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201）上做过一些阅读，已经安装了库，并编译了自己的主机。

Which leads me to question #1 这使我想到了问题1

This class doesn't exist in OpenCL, so I'm thinking I need to create a kernel just to hold this class. 该类在OpenCL中不存在，因此我认为我需要创建一个内核来容纳该类。

Variables: 变量：

runningTotal is a long runningTotal很长

range is a const long 范围是const long

halfRange is a const long (ie range/2) halfRange是一个const long（即range / 2）

My second question is. 我的第二个问题是。

Since it's not an array (and most OpenCL tutorials discuss how to have OpenCL assign multiple elements in an array simultaneously). 由于它不是数组（大多数OpenCL教程都讨论了如何让OpenCL同时在数组中分配多个元素）。

How do I setup 我该如何设定

  runningTotal += ((rng_1.rand_cmwc()%range + 1) <= halfRange ? 1: -1);

to run on multiple cores? 在多个内核上运行？ Do I do a workgroup? 我会做一个工作组吗？

Could someone give an example of how I would do the cl_program clCreateProgramWithSource command referencing multiple kernels? 有人可以举一个例子说明我将如何使用cl_program clCreateProgramWithSource命令引用多个内核吗？

I'm sure I'm going to have more questions, but I think I'm going to need two kernel's, each running it's own workgroup?, one for my cmwc class, and one for the runningTotal summation. 我确定我还会有更多问题，但是我想我需要两个内核，每个内核都运行它自己的工作组？一个用于我的cmwc类，一个用于runningTotal总和。

Then somehow sync all the work-items every so often to a larger total. 然后以某种方式经常将所有工作项目同步到更大的总数。

Answer 1

First question: I believe that only AMD supports the use of classes in kernels through an extension called Static C++ Kernel language (see http://developer.amd.com/Assets/CPP_kernel_language.pdf ) 第一个问题：我相信只有AMD通过称为Static C ++ Kernel语言的扩展支持内核中类的使用（请参阅http://developer.amd.com/Assets/CPP_kernel_language.pdf ）

Second question: To do the summation in parallel you have to use a parallel summation algorithm such as prefix sum (http://en.wikipedia.org/wiki/Prefix_sum) or reduction (http://developer.amd.com/Resources/documentation/articles/Pages/OpenCL-Optimization-Case-Study-Simple-Reductions.aspx). 第二个问题：要并行进行求和，您必须使用并行求和算法，例如前缀和（http://en.wikipedia.org/wiki/Prefix_sum）或归约（http://developer.amd.com/Resources /documentation/articles/Pages/OpenCL-Optimization-Case-Study-Simple-Reductions.aspx）。 Note that there exists libraries for this. 请注意，存在用于此的库。

Hope that helps. 希望能有所帮助。 Good luck :) 祝好运：）

OpenCL，多个工作组/内核？

问题描述

1 个解决方案

解决方案1
1 2012-10-25 15:07:35

OpenCL，多个工作组/内核？

问题描述

1 个解决方案

解决方案1 1 2012-10-25 15:07:35

解决方案1
1 2012-10-25 15:07:35