简体   繁体   中英

Understanding numpy's vectorization of loops

I want to verify that I've understood the concept of vectorized code that is mentioned in many Machine Learning lectures/notes/videos.

I did some reading on this and found that CPU's and GPU's have an instruction set called SIMD; single instruction multiple data.

This works for example by moving two variables to two special 64/128 bit registers, then adding all the bits at once.

I've also read that with most modern compilers like GCC for example, if you turn on optimization with the -Ofast flag, which is

-Ofast - Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays.

The -Ofast should then auto-vectorize any loops written in C/C++ when possible to SIMD instructions.

I tested this out on my own code and was able to get a significant speedup on MNIST dataset from 45 minutes down to 5 minutes.

I am also aware that numpy is written in C and wrapped with PyObjects. I read through a lot of their code but it is difficult.

My question is then: is my understanding above correct, and does Numpy also do the same thing, or do they use explicit pragmas or other special instruction/register names for their vectorization?

numpy doesn't do anything like that.

The term vectorization in numpy context means that you make numpy work on your array directly rather than making a loop yourself. It is usually then passed to what is call "universal functions", or "ufunc" for short. These functions are C functions that will process in C in a C for loop the operation that is intended.

But it usually cannot do any ISA vectorization. The reason is that these functions are universal for all types of arrays, dense or views on these dense arrays. As such, due to the pattern that is used, you cannot expect vectorization.

If you want ISA vectorized numpy calls, you can use numba which JIT can JIT (and thus really ISA vectorize). There is another project that would use one of Intel's libraries, but I can't find it anymore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM