简体   繁体   中英

Why this LEA instruction does not compile?

I am porting 32-bit Delphi BASM code to 64-bit FPC (Win64 target OS) and wonder why the next instruction does not compile in 64-bit FPC:

{$IFDEF FPC}
  {$ASMMODE INTEL}
{$ENDIF}

procedure DoesNotCompile;
asm
      LEA   ECX,[ECX + ESI + $265E5A51]
end;

// Error: Asm: 16 or 32 Bit references not supported

possible workarounds are:

procedure Compiles1;
asm
      ADD   ECX,ESI
      ADD   ECX,$265E5A51
end;

procedure Compiles2;
asm
      LEA   ECX,[RCX + RSI + $265E5A51]
end;

I just don't understand what is wrong with 32-bit LEA instruction in Win64 target (it compiles OK in 32-bit Delphi, so it is a correct CPU instruction).


Optimization remarks:

The next code compiled by 64-bit FPC 2.6.2

  {$MODE DELPHI}
  {$ASMMODE INTEL}

procedure Test;
asm
        LEA     ECX,[RCX + RSI + $265E5A51]
        NOP
        LEA     RCX,[RCX + RSI + $265E5A51]
        NOP
        ADD     ECX,$265E5A51
        ADD     ECX,ESI
        NOP
end;

generates the next assembler output:

00000000004013F0 4883ec08                 sub    $0x8,%rsp
                         project1.lpr:10  LEA     ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26           lea    0x265e5a51(%rcx,%rsi,1),%ecx
                         project1.lpr:11  NOP
00000000004013FB 90                       nop
                         project1.lpr:12  LEA     RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26         lea    0x265e5a51(%rcx,%rsi,1),%rcx
                         project1.lpr:13  NOP
0000000000401404 90                       nop
                         project1.lpr:14  ADD     ECX,$265E5A51
0000000000401405 81c1515a5e26             add    $0x265e5a51,%ecx
                         project1.lpr:15  ADD     ECX,ESI
000000000040140B 01f1                     add    %esi,%ecx
                         project1.lpr:16  NOP
000000000040140D 90                       nop
                         project1.lpr:17  end;
000000000040140E 4883c408                 add    $0x8,%rsp

and the winner is (7 bytes long):

LEA     ECX,[RCX + RSI + $265E5A51]

all 3 alternatives (including LEA ECX,[ECX + ESI + $265E5A51] which does not compile by 64-bit FPC) are 8 bytes long.

Not sure that the winner is best in speed.

I would regard this as a bug in the FPC assembler. The asm code you present is valid, and in 64 bit mode it is perfectly valid to use LEA with 32 bit registers, as you have done. The Intel processor documents are clear on the matter. The Delphi 64 bit inline assembler accepts this code.

To workaround this you will need to hand assemble the code:

DQ    $265e5a510e8c8d67

In the Delphi CPU view this comes out as:

Project1.dpr.12: DQ    $265e5a510e8c8d67
0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ecx+$265e5a51]

I performed a very simple benchmarking to compare the use of 32 and 64 bit operands, and a version using two ADDs. The code looks like this:

{$APPTYPE CONSOLE}

uses
  System.Diagnostics;

function BenchWithTwoAdds: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    ADD   EAX,ESI
    ADD   EAX,$265E5A51
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;

function BenchWith32bitOperands: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    LEA   EAX,[EAX + ESI + $265E5A51]
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;

{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    LEA   EAX,[RAX + RSI + $265E5A51]
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;
{$ENDIF}

var
  Stopwatch: TStopwatch;

begin
{$IFDEF CPUX64}
  Writeln('64 bit');
{$ELSE}
  Writeln('32 bit');
{$ENDIF}
  Writeln;

  Writeln('BenchWithTwoAdds');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWithTwoAdds);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
  Writeln;

  Writeln('BenchWith32bitOperands');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWith32bitOperands);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
  Writeln;

{$IFDEF CPUX64}
  Writeln('BenchWith64bitOperands');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWith64bitOperands);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
{$ENDIF}

  Readln;
end.

The output on my an Intel i5-2300:

32 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 2615

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3915

----------------------

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 2612

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3917

BenchWith64bitOperands
Value = -644343429
Elapsed time = 3918

As you can see there's nothing to choose between either of the LEA options based on this. The differences between their times are well inside the variability of the measurement. However, the variant using ADD twice wins hands down.

Some different results from different machines. Here's the output on a Xeon E5530:

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 3434

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3295

BenchWith64bitOperands
Value = -644343429
Elapsed time = 3279

And on a Xeon E5-4640 v2:

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 4102

BenchWith32bitOperands
Value = -644343429
Elapsed time = 5868

BenchWith64bitOperands
Value = -644343429
Elapsed time = 5868

Separate to the size of the operands themselves, the components of memory operands have a default size. In 64-bit mode it is 64 bits, meaning you should use the 64-bit registers for components of memory operands unless you have a particular reason.

The x86 ISA does allow changing the size for a given instruction with prefix byte 0x67 , but you probably don't want to do that (and apparently your assembler doesn't even support it).

To make the distinction between operand and operand component a little clearer:

lea eax, dword ptr [rax + rdx * 4]

    ^^^  ^^^^^ ^^^                   operands: can be any size you like
                    ^^^   ^^^        operand components: usually 64-bit

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM