简体   繁体   中英

What do gcc's auto-vectorization messages mean?

I have some code that I would like to run fast, so I was hoping I could persuade gcc (g++) to vectorise some of my inner loops. My compiler flags include

-O3 -msse2 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5

but gcc fails to vectorize the most important loops, giving me the following not-really-very-verbose-at-all messages:

Not vectorized: complicated access pattern.

and

Not vectorized: unsupported use in stmt.

My questions are (1) what exactly do these mean? (How complicated does it have to be before it's too complicated? Unsupported use of what exactly?), and (2) is there any way I can get the compiler to give me even just a tiny bit more information about what I'm doing wrong?

An example of a loop that gives the "complicated access pattern" is

for (int s=0;s<N;++s)
    a.grid[s][0][h-1] =  D[s] * (b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);

and one that gives "unsupported use in stmt" is the inner loop of

for (int s=0;s<N;++s)
    for (int i=1;i<w-1;++i) 
        for (int j=1;j<h-1;++j) 
            a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);

(This is the one that really needs to be optimised.) Here, a.grid and b.grid are three-dimensional arrays of floats, D is a 1D array of floats, and N, w and h are const ints.

Not vectorized: complicated access pattern.

The "uncomplicated" access patterns are consecutive elements access or strided element access with certain restrictions (single element of the group accessed in the loop, group element count being a power of 2, group size being multiple of the vector type).

b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);

Neither sequential nor strided access

Not vectorized: unsupported use in stmt.

Here "use" is in the data-flow sense, getting the value of a variable (register, compiler temporary). In this case the "supported uses" are variables, defined in the current iteration of the loop, constants and loop invariants.

a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);

In this example, I think the "unsupported use" is because b.grid[s][i][j-1] and b.grid[s][i][j+1] are assigned ("defined") by a previous iteration of the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM