简体   繁体   English

如何使gcc向量化此循环

[英]how to enable gcc to vectorize this loop

I have this loop where b2 is a float , x1 is a (Eigen c++) vector of float , a1 and a0 are int . 我有此回路,其中b2是一个floatx1是(本征C ++)的矢量floata1a0int

for(int i=1;i<9;i++)
    b2+=a0*(float)0.5*(std::log(fabs(x1(a1+a0*(i-1))))+std::log(fabs(x1(a1+a0*i))));

GCC returns: GCC返回:

analyze_innermost: failed: evolution of base is not affine.

I was wondering if there was a simple way to rewrite the loop to allow GCC to vectorize it (I'm compiling with all the unsafe options enabled...I'm doing this to learn). 我想知道是否有一种简单的方法来重写循环以允许GCC对它进行矢量化(我正在使用所有启用的不安全选项进行编译...我正在这样做以进行学习)。

Edit: 编辑:

x1 is an eigen construct. x1是本征结构。 I'm using GCC 4.8.1 with O3 flag. 我正在使用带有O3标志的GCC 4.8.1。

Your example cannot be easily vectorized because you're not accessing the entries of x1 in a sequential manner. 您的示例无法轻易地向量化,因为您没有以顺序的方式访问x1的条目。

With sequential access, it could be vectorized like that: 通过顺序访问,可以将其向量化为:

ArrayXf x1;
b2 = (x1.segment(i,9).abs().log() + x1.segment(j,9).abs().log()).sum() * a0;

I would break this up into 3 loops: 我将其分为3个循环:

float t1[9];
float t2[9];

for (i = 0; i < 9; ++i)                // (1) - gather input terms
    t1[i] = x1(a1+a0*i);

for (i = 0; i < 9; ++i)                // (2) - do expensive log/fabs operations
    t2[i] = std::log(fabs(t1[i]));     //       with minimum redundancy

for (i = 1; i < 9; ++i)                // (3) - wrap it all up
    b2 += a0*0.5f*(t2[i-1] + t2[i]);

I suspect that (1) may not be vectorizable (unless you have AVX2 with gathered loads), but (2) and (3) have a reasonable chance. 我怀疑(1)可能无法向量化(除非您的AVX2具有聚集的负载),但是(2)和(3)有合理的机会。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM