[英]Why does GCC not auto-vectorize this loop?
我正在嘗試優化一個占我程序很多計算時間的循環。
但是,當我使用-O3 -ffast-math -ftree-vectorizer-verbose = 6 GCC輸出打開自動矢量化時,它無法對循環進行矢量化。
我正在使用GCC 4.4.5
代碼:
/// Find the point in the path with the largest v parameter
void prediction::find_knife_edge(
const float * __restrict__ const elevation_path,
float * __restrict__ const diff_path,
const float path_res,
const unsigned a,
const unsigned b,
const float h_a,
const float h_b,
const float f,
const float r_e,
) const
{
float wavelength = (speed_of_light * 1e-6f) / f;
float d_ab = path_res * static_cast<float>(b - a);
for (unsigned n = a + 1; n <= b - 1; n++)
{
float d_an = path_res * static_cast<float>(n - a);
float d_nb = path_res * static_cast<float>(b - n);
float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
float v = h * std::sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));
diff_path[n] = v;
}
}
來自GCC的消息:
note: not vectorized: number of iterations cannot be computed.
note: not vectorized: unhandled data-ref
在有關自動向量化的頁面( http://gcc.gnu.org/projects/tree-ssa/vectorization.html )上,它聲明它支持未知的循環邊界。
如果我將替換為
for (unsigned n = 0; n <= 100; n++)
然后將其向量化。
我究竟做錯了什么?
缺少有關這些消息確切含義的詳細文檔以及GCC自動矢量化的來龍去脈非常令人討厭。
編輯:
感謝David,我將循環更改為:
for (unsigned n = a + 1; n < b; n++)
現在,GCC嘗試對循環進行矢量化處理,但拋出以下錯誤:
note: not vectorized: unhandled data-ref
note: Alignment of access forced using peeling.
note: Vectorizing an unaligned access.
note: vect_model_induction_cost: inside_cost = 1, outside_cost = 2 .
note: not vectorized: relevant stmt not supported: D.76777_65 = (float) n_34;
“ D.76777_65 =(float)n_34;”是什么? 意思?
我可能會稍微降低細節,但這是您需要重組循環以使其向量化的方式。 訣竅是預先計算迭代次數,並從0迭代到該數目的短數。 不要更改for
語句。 您可能需要先修復兩行,然后再修復循環頂部的兩行。 他們大概是對的。 ;)
const unsigned it=(b-a)-1;
const unsigned diff=b-a;
for (unsigned n = 0; n < it; n++)
{
float d_an = path_res * static_cast<float>(n);
float d_nb = path_res * static_cast<float>(diff - n);
float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
float v = h * sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));
diff_path[n] = v;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.