[英]how to enable gcc to vectorize this loop
I have this loop where b2
is a float
, x1
is a (Eigen c++) vector of float
, a1
and a0
are int
. 我有此回路,其中
b2
是一个float
, x1
是(本征C ++)的矢量float
, a1
和a0
是int
。
for(int i=1;i<9;i++)
b2+=a0*(float)0.5*(std::log(fabs(x1(a1+a0*(i-1))))+std::log(fabs(x1(a1+a0*i))));
GCC returns: GCC返回:
analyze_innermost: failed: evolution of base is not affine.
I was wondering if there was a simple way to rewrite the loop to allow GCC to vectorize it (I'm compiling with all the unsafe options enabled...I'm doing this to learn). 我想知道是否有一种简单的方法来重写循环以允许GCC对它进行矢量化(我正在使用所有启用的不安全选项进行编译...我正在这样做以进行学习)。
x1 is an eigen construct. x1是本征结构。 I'm using GCC 4.8.1 with O3 flag.
我正在使用带有O3标志的GCC 4.8.1。
Your example cannot be easily vectorized because you're not accessing the entries of x1 in a sequential manner. 您的示例无法轻易地向量化,因为您没有以顺序的方式访问x1的条目。
With sequential access, it could be vectorized like that: 通过顺序访问,可以将其向量化为:
ArrayXf x1;
b2 = (x1.segment(i,9).abs().log() + x1.segment(j,9).abs().log()).sum() * a0;
I would break this up into 3 loops: 我将其分为3个循环:
float t1[9];
float t2[9];
for (i = 0; i < 9; ++i) // (1) - gather input terms
t1[i] = x1(a1+a0*i);
for (i = 0; i < 9; ++i) // (2) - do expensive log/fabs operations
t2[i] = std::log(fabs(t1[i])); // with minimum redundancy
for (i = 1; i < 9; ++i) // (3) - wrap it all up
b2 += a0*0.5f*(t2[i-1] + t2[i]);
I suspect that (1) may not be vectorizable (unless you have AVX2 with gathered loads), but (2) and (3) have a reasonable chance. 我怀疑(1)可能无法向量化(除非您的AVX2具有聚集的负载),但是(2)和(3)有合理的机会。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.