简体繁体 English

在编写openCL代码时，它如何在没有GPU的单核机器上执行？

[英]When writing openCL code, how does it perform on a single-core machine without a GPU?

原文 2011-01-31 11:30:05 6 2 c/ parallel-processing/ opencl/ raytracing

Hey all, I Am currently porting a raytracer from FORTRAN 77 to C for a research project. 嘿所有，我目前正在将FORTRAN 77的光线跟踪器移植到C进行研究项目。

After having ported the essentials, the question is how we proceed to parallelization. 移植完要素后，问题是我们如何进行并行化。
In the lab, I have access to a couple of different Opteron machines, with between 2 and 8 cores, but no GPUs (for now). 在实验室中，我可以访问几个不同的Opteron机器，具有2到8个内核，但没有GPU（目前）。 We are running 64b gentoo. 我们正在运行64b gentoo。

A GPGPU version would be (very) desirable, but with only one programmer on the project, maintaining separate non-GPU and GPU versions isn't an option. GPGPU版本（非常）是可取的，但项目中只有一个程序员，维护单独的非GPU和GPU版本不是一种选择。
Also, the code will be GPLed, and we'd like to see it being used by others that may have vastly different hardware. 此外，代码将是GPL，并且我们希望看到它被其他可能具有完全不同硬件的人使用。

So the entire program has to be easy to compile/run without having a GPU or even a multicore system. 因此，整个程序必须易于编译/运行，而无需GPU或甚至多核系统。
OpenCl seems like a good option, as it can be run on machines without GPUs, but how will this code perform on a single-core or 32b system? OpenCl似乎是一个不错的选择，因为它可以在没有GPU的机器上运行，但是这个代码将如何在单核或32b系统上运行？
Would it be possible to write the code in such a way that it can easily be compiled without openCL? 是否有可能以这样的方式编写代码，以便在没有openCL的情况下轻松编译代码？

2 个解决方案

Currently there are four major OpenCL implementations: AMD, nVidia (Cuda), Apple, Intel and there will be more soon probably: OpenCL implementations . 目前有四种主要的OpenCL实现：AMD，nVidia（Cuda），Apple，Intel，很快就会有更多： OpenCL实现。 OpenCL is not a language specifically targeted at GPU computing, it was designed as generic computing language for heterogeneous devices. OpenCL不是专门针对GPU计算的语言，它被设计为异构设备的通用计算语言。 Eg you can use the AMD implementation even with no GPU and any non-AMD CPU (x86 of course). 例如，即使没有GPU和任何非AMD CPU（当然是x86），你也可以使用AMD实现。

Would it be possible to write the code in such a way that it can easily be compiled without openCL? 是否有可能以这样的方式编写代码，以便在没有openCL的情况下轻松编译代码？

As you say it's a one man project I doubt it will be worth the effort. 正如你所说这是一个单人项目，我怀疑它是值得的。

How will this code perform on a single-core or 32b system? 这段代码将如何在单核或32b系统上运行？

As good as any native program would run. 与任何本机程序一样好。 You have access to SIMD through OpenCL vector types. 您可以通过OpenCL矢量类型访问SIMD。 And you can handle the threading through the work group configuration. 您可以通过工作组配置处理线程。

But don't expect that you can get 100% performance out of every device with the same kernel/ work group settings. 但是，不要指望使用相同的内核/工作组设置可以从每台设备中获得100％的性能。 There's a lot of device specific tweaking possible ( OpenCL CPU Tutorial for a start ). 可以进行大量特定于设备的调整（ OpenCL CPU Tutorial开始）。

I would say go for OpenCL. 我会说去OpenCL。 It provides more possibilities for your application and it's platform independet. 它为您的应用程序提供了更多的可能性，并且它的平台是独立的。

It may well be feasible to exploit the commonality of OpenCL and C99, and use the pre-processor to handle the differences. 利用OpenCL和C99的通用性并使用预处理器来处理差异可能是可行的。 Then, you would have a C99 and OpenCL codebase in one. 然后，您将拥有一个C99和OpenCL代码库。 This is the approach taken in SmallPT-GPU 这是SmallPT-GPU采用的方法

However, the OpenCL implementations for CPU should be pretty much as good as any portable scalar C code, and better if you are using the vector types of OpenCL to allow portable SIMD. 但是，CPU的OpenCL实现应该与任何便携式标量C代码一样好，如果您使用OpenCL的矢量类型来允许可移植的SIMD，则更好。