简体   繁体   中英

Is it necessary or easier to code x86 assembly by following the purpose of each General Purpose register

In general, Is it necessary or easier to code x86 assembly by following the purpose of each register?

The registers in x86 architecture were each first designed to have a special purpose, but compilers modernly doesn't seems to care their usage(unless under some special condition such as REP MOV or MUL).

so, will it be easier or more optimize to code depend on the purpose of each registers?(regardless of the special instructions(or encoding) that are identical to some register)

For instance(I could use REP MOVSB or LODSB STOSB instead, but just to demonstrate):

1st Code:

LEA ESI,[AddressOfSomething]
LEA EDI,[AddressOfSomethingElse]
MOV ECX,NUMBER_OF_LOOP
LoopHere:
MOV AL,[ESI]
ADD AL,8
MOV [EDI],AL
ADD ESI,1
ADD EDI,1
CMP AL,0
JNZ LoopHere
TheEnd:
;...

2nd Code:

LEA ECX,[AddressOfSomething]
LEA EDX,[AddressOfSomethingElse]
MOV EBX,NUMBER_OF_LOOP
LoopHere:
MOV AL,[ECX]
ADD AL,8
MOV [EDX],AL
ADD ECX,1
ADD EDX,1
CMP AL,0
JNZ LoopHere
TheEnd:
;...

The Compiler I used--Visual Studio 2015 usually uses the 2nd method when doing tasks such as this, it doesn't use registers depend on its' purpose, instead, the compiler only choose what register to use based on its' "volatile" or "non-volatile" characteristic(after calling a function). Because of this, all the high-level-programming-language programmed software disassembly use the 2nd method.

Another interesting fact is that in ARM language, the GPRs all serves the same purpose, and are named R0-R7, which means that when code with it, the code will be more similar to 2nd code.

All in all, my opinion is that these two codes uses the same instructions, therefore it should have same speed regardless of what register I used. But am I correct? and which code is easier to code with?

Following the purpose of each register primarily achieves:

  • Code density

    For example using the A register 1 usually reduces the code size for common operations like moving, arithmetic, logic and IO 2 .
    Using the C register for counting let you exploit the jcxz family of instructions, avoiding an explicit compare.
    movsd and similar are very "dense" instructions they perform complex operations that otherwise would require a lot of code.

    However code density doesn't mean "faster" due to the fact that x86 is blatantly CISC, a complex instruction can take more time to execute than an equivalent series of simpler instructions 3 .

  • Readibility

    An instruction like rep movsd effectively is an "high level" way of coding a cycle that moves data from a source to a destination.
    Parsing the cycle

    push eax pushf .loop: mov eax, DWORD [esi] mov DWORD [es:edi], eax add esi, 4*(1-D*2) add edi, 4*(1-D*2) dec ecx jnz .loop popf pop eax

    is a lot more difficult.

  • Idiomatic programming

    The use of SP as a stack pointer is assumed by a lot of instructions ( call , ret , push , ...).
    It is possible to avoid using SP as a stack pointer, but it wouldn't be very idiomatic (nor efficient).

  • Less data moving

    In real mode, only a few registers could be used as a base (one of them being the B register).
    Keeping addresses in B from the beginning would avoid moving them into it later. Though register-register moves don't need an execution unit today, they make the source harder to read 4 .

Most of the idiomatic register usages have been relaxed today 5 because too much specific purpose registers reduce the optimisations a compiler can do (and spilling onto the stack is expensive).

CPUs are very complex, if you want to write code for speed then you should consider speed metrics only. Idiomatic register usage is not one of them, for one thing that there is not a single A , B or C register at the micro-architecture level so "registers" as the programmer see them is only a human concept (well, and a front-end concept).


1 In its forms AL , AX , EAX , RAX
2 mov A, [mem] uses the opcodes A0 or A1 , while mov B, [mem] uses 8A 1E or 8B 1E . The same is true for add and similar. in , out , div , mul enforce the use of A .
3 But not to fetch and decode.
4 Is there an equivalent of "Spaghetti code" for data moves into registers?
5 Consider for example the various addressing mode or the imul instruction

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM