OpenCl中的向量类型和一维数组

Question

I'd like to implement two versions of my kernel, a vector and a scalar versions. 我想实现我的内核的两个版本，一个向量和一个标量版本。 Now I'm wondering whether let's say double4 type is similar in term of memory access to a array of double of size 4. 现在我想知道是否让我们说double4类型在内存访问方面与大小为4的双倍数组相似。
What I have in mind is to use the same data type for my two kernels where in the scalar one I will just work on each component individually (.s0 .. .s3) like with a regular array. 我的想法是为我的两个内核使用相同的数据类型，在标量中我将单独处理每个组件（.s0 .. .s3），就像使用常规数组一样。
In other world I'd like to use OpenCl vector types for storage only in the scalar kernel and take the advantage of the vector properties in the vector kernel. 在其他世界中，我想使用OpenCl矢量类型仅存储在标量内核中，并利用矢量内核中的矢量属性。
I honestly don't want to have different variable types for each kernel. 老实说，我不希望每个内核都有不同的变量类型。
Does that make sense to you guys? 这对你们有意义吗？
Any hints here? 这里有什么提示吗？
Thank you, 谢谢，

Éric. 埃里克。

Answer 1

2, 4, 8 and 16 element vectors are laid out in memory just like 2/4/8/16 scalars. 与4/4/8/16标量一样，在存储器中布置2,4,8和16个元素向量。 The exception is 3 element vectors, which use as much memory as 4 element vectors. 例外是3个元素向量，它使用与4个元素向量一样多的内存。 The main benefit of using vectors in my experience has been that all devices support some form of instruction level parallelism, either through SIMD instructions like on CPUs or through executing independent instructions simultaneously, which happens on GPUs. 在我的经验中使用向量的主要好处是所有设备都支持某种形式的指令级并行，可以通过CPU上的SIMD指令或同时执行独立指令（这些都发生在GPU上）。

Answer 2

Regarding the memory access pattern: 关于内存访问模式：

This depends first and foremost on your OpenCL kernel compiler: A reasonable compiler would use a single memory transaction to fetch the data for multiple array cells used in a single work item, or even multiple cells used in multiple items. 这首先取决于您的OpenCL内核编译器：合理的编译器将使用单个内存事务来获取单个工作项中使用的多个阵列单元的数据，甚至是多个项中使用的多个单元。 On NVidia GPUs global device memory is read in units of 128 bytes, which makes it worthwhile to coalesce as many as ( Edit: ) 32 float values for every read; 在NVidia GPU上，全局设备内存以128字节为单位进行读取，这样就可以为每次读取合并（ Edit： ）32个浮点值。 see 看到

NVidia CUDA Best Pracices Guide: Coalesced Access to Global Memory NVidia CUDA最佳实践指南：合并访问全球内存

So using float4 might not even be enough to maximize your bandwidth utilization. 因此，使用float4可能甚至不足以最大化您的带宽利用率。

Regarding the use of vector types in kernels: 关于在内核中使用向量类型：

I believe that these would be useful mostly, if not only, on CPUs with vector instructions, and not on GPUs - where work items are inherently scalar; 我相信这些主要是有用的，如果不仅仅是在带有向量指令的CPU上，而不是在GPU上 - 工作项本身就是标量; the vectorization is over multiple work items. 矢量化超过了多个工作项。

Answer 3

Not sure if I get your question. 不确定我是否收到你的问题。 I'll give it a try with a bunch of general hints&tricks. 我将尝试一些一般的提示和技巧。

You don't have arrays in private memory, so here vectors can come in handy. 你没有私有内存中的数组，所以这里的矢量可以派上用场。 As is described by the others, memory-alignment is comparable. 正如其他人所描述的那样，记忆对准是可比较的。 See http://streamcomputing.eu/blog/2013-11-30/basic-concepts-malloc-kernel/ for some information. 有关一些信息，请参阅http://streamcomputing.eu/blog/2013-11-30/basic-concepts-malloc-kernel/ 。

The option you are missing is using the structs. 您缺少的选项是使用结构。 Read the second part of the first answer of Arranging memory for OpenCL to know more. 阅读为OpenCL安排内存的第一个答案的第二部分，了解更多信息。

Another thing that could be handy: 另一件可能很方便的事情：

__attribute__((vec_type_hint(vectortype)))

Intel has various explanations: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/Writing_Kernels_to_Directly_Target_the_Intel_Architecture_Processors.htm 英特尔有各种解释： http ： //software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/Writing_Kernels_to_Directly_Target_the_Intel_Architecture_Processors.htm

It is quite tricky to write multiple kernels in one. 在一个内核中编写多个内核非常棘手。 You can use macro-tricks as described in http://streamcomputing.eu/blog/2013-10-17/writing-opencl-code-single-double-precision/ 你可以使用http://streamcomputing.eu/blog/2013-10-17/writing-opencl-code-single-double-precision/中描述的宏观技巧

OpenCl中的向量类型和一维数组

问题描述

3 个解决方案

解决方案1
2 已采纳 2013-12-01 20:36:01

解决方案2
1 2013-11-29 23:49:35

解决方案3
1 2013-12-05 06:29:05

OpenCl中的向量类型和一维数组

问题描述

3 个解决方案

解决方案1 2 已采纳 2013-12-01 20:36:01

解决方案2 1 2013-11-29 23:49:35

解决方案3 1 2013-12-05 06:29:05

解决方案1
2 已采纳 2013-12-01 20:36:01

解决方案2
1 2013-11-29 23:49:35

解决方案3
1 2013-12-05 06:29:05