如何在不影响性能的情况下抽象SIMD代码以处理不同的数据类型

Question

I have been writing code for performing matrix operations. 我一直在编写用于执行矩阵运算的代码。 Initially it was just for x86 and now am porting it to different architectures. 最初它仅用于x86，现在正将其移植到不同的体系结构。 Also, I want it to support different data types other than float. 另外，我希望它支持除float以外的其他数据类型。

Consider the following code for adding to float arrays 考虑以下代码以添加到浮点数组

void add(float *a, float *b, float *dst, int len)
{
        int k = 0;
        for(; k + 8 <  len; k += 8,a +=  8, b += 8, dst+= 8){
            __m256 x = _mm256_load_ps(a);
            __m256 y = _mm256_load_ps(b);
            __m256 z = _mm256_add_ps(x, y);
            _mm256_store_ps(dst, z);
        }
}

Here is what I have thought of to improve the code to support several platforms and data types. 这就是我考虑过的改进代码以支持多种平台和数据类型的想法。

For different the data types, I was going to change the function to template function 对于不同的数据类型，我打算将功能更改为模板功能
For simd instructions I thought of having macros that rename all architecture specific intrinsic functions to generic simd instructions such as SIMD_ADD . 对于simd指令，我想到了将所有特定于体系结构的固有函数重命名为通用simd指令（例如SIMD_ADD）的宏。 Problem is that different data-types require different intrinsic functions and the return type of the intrinsic is dependent on the data type too. 问题在于，不同的数据类型需要不同的内在函数，并且内在函数的返回类型也取决于数据类型。
Also if I were to write a subtract function, I would end up copying most of the code just to replace SIMD_ADD macro for SIMD_SUB macro. 同样，如果我要编写一个减法函数，我最终将复制大部分代码，只是将SIMD_ADD宏替换为SIMD_SUB宏。 Is their a neat way such that the I dont have to repeat the same code for all element wise operations such multiplication, division and subtraction ? 它们是否是一种整洁的方式，以便我不必对所有元素明智的操作（例如乘法，除法和减法）重复相同的代码？

How would one tackle points 2 & 3 without abstracting to the extent of affecting the code's performance ? 在不抽象影响代码性能的程度的情况下，如何解决第2点和第3点？

Answer 1

I ended up having template classes for simd instructions with a specialization for each datatype. 我最终得到了simd指令的模板类，每种数据类型都有专门的类。 Unfortunately the compiler will not inline it automatically therefore you must use the compiler specific attributes to force it to inline 不幸的是，编译器不会自动内联它，因此您必须使用编译器特定的属性来强制它内联

如何在不影响性能的情况下抽象SIMD代码以处理不同的数据类型

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-06-03 08:48:33

如何在不影响性能的情况下抽象SIMD代码以处理不同的数据类型

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-06-03 08:48:33

解决方案1
0 已采纳 2018-06-03 08:48:33