I want to build a datatype that represents multiple (say N
) arithmetic types and provides the same interface as an arithmetic type using operator overloading, such that I get a datatype like Agner Fog's vectorclass .
Please look at this example: Godbolt
#include <array>
using std::size_t;
template<class T, size_t S>
class LoopSIMD : std::array<T,S>
{
public:
friend LoopSIMD operator*(const T a, const LoopSIMD& x){
LoopSIMD result;
for(size_t i=0;i<S;++i)
result[i] = a*x[i];
return result;
}
LoopSIMD& operator +=(const LoopSIMD& x){
for(size_t i=0;i<S;++i){
(*this)[i] += x[i];
}
return *this;
}
};
constexpr size_t N = 7;
typedef LoopSIMD<double,N> SIMD;
SIMD foo(double a, SIMD x, SIMD y){
x += a*y;
return x;
}
That seems to work pretty good up to a certain number of elements, which is 6 for gcc-10 and 27 for clang-11. For a larger number of elements the compilers do not use the FMA (eg vfmadd213pd
) operations anymore. Instead they proceed the multiplications (eg vmulpd
) and additions (eg vaddpd
) separately.
Questions:
Thank you!
I did the following, and was able to get some pretty good results, for gcc 10.2 with the same -Ofast -march=skylake -ffast-math
as your godbolt link.
friend LoopSIMD operator*(const T a, const LoopSIMD& x) {
LoopSIMD result;
std::transform(x.cbegin(), x.cend(), result.begin(),
[a](auto const& i) { return a * i; });
return result;
}
LoopSIMD& operator+=(const LoopSIMD& x) {
std::transform(this->cbegin(), this->cend(), x.cbegin(), this->begin(),
[](auto const& a, auto const& b) { return a + b; });
return *this;
}
std::transform
has some crazy overloads so I think I need to explain.
The first overload captures a
, multiplies each value, and stores it back at the beginning of result.
The second overload acts as a zip
adding both values together from x
and this
and storing the result back to this
.
If you're not married to operator+=
and operator*
you can create your own fma
like so
LoopSIMD& fma(const LoopSIMD& x, double a ){
std::transform_inclusive_scan(
x.cbegin(),
x.cend(),
this->begin(),
std::plus{},
[a](auto const& i){return i * a;},
0.0);
return *this;
}
This requires c++17, but will loop keep the SIMD instruction in
foo(double, LoopSIMD<double, 40ul>&, LoopSIMD<double, 40ul> const&):
xor eax, eax
vxorpd xmm1, xmm1, xmm1
.L2:
vfmadd231sd xmm1, xmm0, QWORD PTR [rsi+rax]
vmovsd QWORD PTR [rdi+rax], xmm1
add rax, 8
cmp rax, 320
jne .L2
ret
You could also simply make your own fma function:
template<class T, size_t S>
class LoopSIMD : std::array<T,S>
{
public:
friend LoopSIMD fma(const LoopSIMD& x, const T y, const LoopSIMD& z) {
LoopSIMD result;
for (size_t i = 0; i < S; ++i) {
result[i] = std::fma(x[i], y, z[i]);
}
return result;
}
friend LoopSIMD fma(const T y, const LoopSIMD& x, const LoopSIMD& z) {
LoopSIMD result;
for (size_t i = 0; i < S; ++i) {
result[i] = std::fma(y, x[i], z[i]);
}
return result;
}
// And more variants, taking `const LoopSIMD&, const LoopSIMD&, const T`, `const LoopSIMD&, const T, const T`, etc
};
SIMD foo(double a, SIMD x, SIMD y){
return fma(a, y, x);
}
But to allow for better optimisations in the first place, you should align your array. Your original code optimises well if you do:
constexpr size_t next_power_of_2_not_less_than(size_t n) {
size_t pow = 1;
while (pow < n) pow *= 2;
return pow;
}
template<class T, size_t S>
class LoopSIMD : std::array<T,S>
{
public:
// operators
} __attribute__((aligned(next_power_of_2_not_less_than(sizeof(T[S])))));
// Or with a c++11 attribute
/*
template<class T, size_t S>
class [[gnu::aligned(next_power_of_2_not_less_than(sizeof(T[S])))]] LoopSIMD : std::array<T,S>
{
public:
// operators
};
*/
SIMD foo(double a, SIMD x, SIMD y){
x += a * y;
return x;
}
I've found an improvement for the example given.
Adding #pragma omp simd
before the loops GCC manages to make the FMA optimization up to N=71
.
https://godbolt.org/z/Y3T1rs37W
The size could even more improved if AVX512 is used:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.