简体   繁体   中英

Are REP instructions considered vector operations?

I am trying to understand the concepts of SIMD and vector instructions. If I understand correctly:

  • Vector instructions are instructions that operate on a one dimensional array of data (=vector), as opposed to scalar instructions which operate on a single data item.
  • SIMD instructions are actually Single Instruction Multiple Data instructions which seem like the same thing as vector instruction... I don't really know the difference and if there is any..

REP instructions operate on an array of data, therefore it seems like they are actually SIMD/vector instructions. I haven't seen any article describing them as vector instructions, and I know that REP instructions are not part of the SIMD extensions of x86.

My questions are:

  1. Is REP considered a vector operation?
  2. Is REP considered a SIMD instruction?
  3. Is there any actual difference between vector and SIMD instructions?

A quick google search for my third question led me to this:

Vector-processing architectures are now considered separate from SIMD computers, based on the fact that vector computers processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD computers process all elements of the vector simultaneously.

In actual articles I've read I haven't seen the mentioned difference, and the vector and SIMD terms were used interchangeably, so that what led me to think that there is no actual difference...

"Vector" and "SIMD" mean much the same thing, but in common usage the terms typically point to different implementation approaches. This distinction derives from the history of the terms in computing. Both "vector" and "SIMD" instruction sets are based on the concept of performing the same operation on multiple data elements in cases for which there are no data dependencies within the sequence of operations. When there are no data dependencies, the operations can be performed in any order, including simultaneously.

Historically, "vector" is the older term, and "vector" instructions are thought of as single instructions that perform some operation on a sequence of elements by pipelining the operations through a single functional unit. The "single functional unit" has nothing to do with vectorization as a concept -- it was the way vector machines were implemented when transistors were very expensive (mid-1960s through mid-1990s). More recent "vector" architectures use a single vector instruction to pipeline operation across multiple functional units. Eg, the NEC SX-Aurora TSUBASA processor has 256-element vector registers and 32 vector functional units, with each 256-element vector sending 8 elements to each vector functional unit.

I don't know when the term "SIMD" was first used, but I don't recall seeing it in common use before the mid-1990s, when "SIMD" instructions were first developed as a means of performing multiple parallel operations on smaller data sizes within existing register widths. For example, the Intel MMX instruction set (1997) enables the processor to perform independent 8/16/32-bit operations on the contents of a 64-bit register. Later SIMD instruction sets (SSE, etc) provide new registers that are wider than any single supported data type, to allow operation on independent fields up to 64 bits wide within the register. The design of the instruction set supports simultaneous operation of the operations across the entire SIMD register width, but this is not required. AMD, for example, has produced several generations of processors that support instructions on wider SIMD registers than the parallelism of the functional units. Eg, AMD's first-generation EPYC processors support 256-bit SIMD instructions, but these are dispatched to 128-bit-wide functional units over two consecutive cycles. ARM's Scalable Vector Extensions further decouple the concepts of vector width and number of parallel functional units.

The "REP" instructions in the x86 architecture provide a limited ability to provide vector-like functionality for the "string instructions" and "in/out" instructions. They are not a general mechanism, and I am sure that many of the Intel processor designers wish they could have been dropped from the instruction set. Some interesting historical notes are in the forum discussion at https://software.intel.com/en-us/forums/intel-fortran-compiler/topic/275765

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM