简体   繁体   中英

MUL/DIV instructions vs. MOV & SHL/SHR (Pentium Pro)

Why would you want to use:

MOV EAX, 22 
SHL EAX, 2

...when multiplying by 4 opposed to just using the MUL instruction?
I understand that this can also be done with SHR instead of DIV as well.

What are the advantages of doing this?
Also can you do this with odd numbers or can it only be even numbers?

There are a number of code idioms that are faster than "MUL constant".

Modern x86 CPUs execute a MUL in several clocks, minimum. So any code sequence that computes the product in 1-2 clocks will outperform the MUL. You can use fast instructions (ADD, SHL, LEA, NEG) and the fact that the processor may execute some of these instructions in parallel in a single clock to replace MUL. Arguably this means you can perform 4 of these instructions in many combinations in 2 clocks if you avoid some data dependencies.

The LEA instruction is particularly interesting because it can multiply by some small constants (1,2,3,4,5,8,9) as well as move the product to another register, which is one easy way to break data dependencies. That allows you to compute a sub-product without destroying the original operand.

Some examples:

Multiply EAX by 5, move product to ESI:

   LEA ESI, [EAX+4*EAX]    ; this takes 1 clock

Multiply EAX by 18:

   LEA  EAX, [EAX + 8*EAX]
   SHL  EAX, 1

Multiply EAX by 7, move result to EBX:

   LEA  EBX, [8*EAX]
   SUB  EBX, EAX

Multiply EAX by 28:

   LEA  EBX, [8*EAX]
   LEA  ECX, [EAX+4*EAX]  ; this and previous should be executed in parallel
   LEA  EAX, [EBX+4*ECX]

Multiply by 1020:

   LEA  ECX, [4*EAX]
   SHL  EAX, 10         ; this and previous instruction should be executed in parallel
   SUB  EAX, ECX

Multiply by 35

   LEA  ECX, [EAX+8*EAX]
   NEG  EAX             ; = -EAX
   LEA  EAX, [EAX+ECX*4]

So, when you want to achieve the effect of multiplying by a modest size constant, you have to think about how it can be "factored" into various products that the LEA instruction can produce, and how one might shift, add, or subtract a partial result to get the final answer.

It is remarkable how many multiply-by-constants can be produced this way. You might think this is only useful for really small constants but as you can see from the 1020 example above you can get some surprisingly medium size ones, too. This turns out be be really handy when indexing into arrays-of-structs because you have to multiply an index by the size of the struct. Often when indexing an array like this, you want to compute the element address and fetch the value; in this case you can merge a final LEA instruction into a MOV instruction, which you cannot do with a real MUL. This buys you additional clock cycle(s) in which to do the MUL by this type of idiom.

[I have built a compiler that computes the "best multiply by constant" using these instructions by doing a small exhaustive search of instruction combinations; it then caches that answer for later reuse].

Using the SHL / SHR instruction is, generally speaking, much faster than MUL / DIV .

To answer your second question, you can do this with odd numbers as well, but you do have to add another instruction. So you can't technically just do it using the SHL / SHR .

For example: the following code multiplies by 5 without using the MUL instruction:

mov num, 5
mov eax, num
mov ebx, num
shl eax, 2    ; MULs by 4
add eax, ebx  ; ADD the x1 to make = 5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM