简体   繁体   English

OpenCL,多个工作组/内核?

[英]OpenCL, multiple-workgroups/Kernels?

I have some code I made in C++ that takes advantage of multiple threads. 我有一些用C ++编写的代码,它们利用了多个线程。

I did away with an array, and can sum up the program as such (running on multiple threads over multiple runs) ie a summation of -1/+1 random numbers 我删除了一个数组,可以像这样总结程序(在多个线程上多次运行),即-1 / + 1随机数的总和

runningTotal += ((rng_1.rand_cmwc()%range + 1) <= halfRange ? 1: -1);

rng_1.rand_cmwc() refers to a function of cmwc class, rng_1 is the object. rng_1.rand_cmwc()引用cmwc类的函数,rng_1是对象。

I've done some reading on OpenCl (http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201) , I have the library setup, and compiled my own host. 我已经在OpenCl(http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201)上做过一些阅读,已经安装了库,并编译了自己的主机。

Which leads me to question #1 这使我想到了问题1

This class doesn't exist in OpenCL, so I'm thinking I need to create a kernel just to hold this class. 该类在OpenCL中不存在,因此我认为我需要创建一个内核来容纳该类。

Variables: 变量:

runningTotal is a long runningTotal很长

range is a const long 范围是const long

halfRange is a const long (ie range/2) halfRange是一个const long(即range / 2)

My second question is. 我的第二个问题是。

Since it's not an array (and most OpenCL tutorials discuss how to have OpenCL assign multiple elements in an array simultaneously). 由于它不是数组(大多数OpenCL教程都讨论了如何让OpenCL同时在数组中分配多个元素)。

How do I setup 我该如何设定

  runningTotal += ((rng_1.rand_cmwc()%range + 1) <= halfRange ? 1: -1);

to run on multiple cores? 在多个内核上运行? Do I do a workgroup? 我会做一个工作组吗?

Could someone give an example of how I would do the cl_program clCreateProgramWithSource command referencing multiple kernels? 有人可以举一个例子说明我将如何使用cl_program clCreateProgramWithSource命令引用多个内核吗?

I'm sure I'm going to have more questions, but I think I'm going to need two kernel's, each running it's own workgroup?, one for my cmwc class, and one for the runningTotal summation. 我确定我还会有更多问题,但是我想我需要两个内核,每个内核都运行它自己的工作组?一个用于我的cmwc类,一个用于runningTotal总和。

Then somehow sync all the work-items every so often to a larger total. 然后以某种方式经常将所有工作项目同步到更大的总数。

First question: I believe that only AMD supports the use of classes in kernels through an extension called Static C++ Kernel language (see http://developer.amd.com/Assets/CPP_kernel_language.pdf ) 第一个问题:我相信只有AMD通过称为Static C ++ Kernel语言的扩展支持内核中类的使用(请参阅http://developer.amd.com/Assets/CPP_kernel_language.pdf

Second question: To do the summation in parallel you have to use a parallel summation algorithm such as prefix sum (http://en.wikipedia.org/wiki/Prefix_sum) or reduction (http://developer.amd.com/Resources/documentation/articles/Pages/OpenCL-Optimization-Case-Study-Simple-Reductions.aspx). 第二个问题:要并行进行求和,您必须使用并行求和算法,例如前缀和(http://en.wikipedia.org/wiki/Prefix_sum)或归约(http://developer.amd.com/Resources /documentation/articles/Pages/OpenCL-Optimization-Case-Study-Simple-Reductions.aspx)。 Note that there exists libraries for this. 请注意,存在用于此的库。

Hope that helps. 希望能有所帮助。 Good luck :) 祝好运 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM