简体   繁体   中英

Automatic vectorization GCC

I'm trying to get GCC 4.7 to automatically vectorize some parts of my code to provide a speed increase, however, it seems difficult to do so.

Here some code that I would like to vectorize:

void VideoLine::WriteOut(unsigned short * __restrict__  start_of_line, const int  number_of_sub_pixels_to_write)
{
  unsigned short * __restrict__ write_pointer = (unsigned short *)__builtin_assume_aligned (start_of_line, 16);
  unsigned short * __restrict__ line = (unsigned short *)__builtin_assume_aligned (_line, 16);
  for (int i = 0; i < number_of_sub_pixels_to_write; i++)
  {
    write_pointer[i] = line[i];
  }
}

I am using the following GCC switches:

-std=c++0x \
-o3 \
-msse \
-msse2 \
-msse3 \
-msse4.1 \
-msse4.2 \
-ftree-vectorizer-verbose=5\
-funsafe-loop-optimizations\
-march=corei7-avx \
-mavx \
-fdump-tree-vect-details \
-fdump-tree-optimized \

I'm aware that some override others.

I do not get any output from the vectorizer at all, however, when looking at the .optomized file, I can see it has not used vectorization. Can anyone point me in the right way to get this to vectorize?

Edit: Turned out the issue was using -o3 rather than -O3.

试着保证, number_of_sub_pixels_to_write是4的倍数,就像在这里完成它一样掩盖它: http//infocenter.arm.com/help/index.jsp?topic = / com.arm.doc.dht0002a / ch01s04s03 html的

The compiler is free to do what it pleases. Therefore, if you really want to use SIMD functionality (and not rely on the compiler), you should use the functions (see the manual ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM