简体   繁体   English

Delphi的64位平台上的汇编器功能

[英]Assembler function on 64-bit platform on Delphi

I have the following function and need to make it compatible with 64-bit platform: 我具有以下功能,需要使其与64位平台兼容:

procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
     mov   ebx, Tab
     mov   ecx, Len
     mov   edx, Buf
@1:  mov   al,  [edx]
     xlat
     mov   [edx], al
     inc   edx
     dec   ecx
     jnz @1
end;

Delphi XE5 raises error [dcc64 Error] E2107 Operand size mismatch on the lines with Tab and Len parameters. Delphi XE5引发错误[dcc64 Error] E2107 Operand size mismatch具有TabLen参数的行上的[dcc64 Error] E2107 Operand size mismatch

Unfortunately I don't know assembler enough to fix the issue myself. 不幸的是,我对汇编器的了解不足以自己解决问题。 What should I change to successfully compile the function? 要成功编译该功能,我应该更改什么?

That assembly code is essentially just doing the following, which would work in both 32bit and 64bit: 该汇编代码实际上只是在执行以下操作,该操作可以在32位和64位上运行:

procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
var
  pBuf: PByte;
begin
  pBuf := PByte(Buf);
  repeat
    pBuf^ := PByte(Tab)[pBuf^];
    Inc(pBuf);
    Dec(Len);
  until Len = 0;
end;

So why not just use plain Delphi code and let the compiler deal with the assembly? 那么,为什么不只使用普通的Delphi代码并让编译器处理程序集呢?

Why you are using assembler? 为什么使用汇编器?

There is no good reason! 没有充分的理由!

This is direct translarion of your asm code to Delphi pascal: 这是将您的asm代码直接翻译成Delphi pascal的代码:

procedure ExecuteAsm(Tab, Buf: PByte; Len: DWORD);
 repeat
   Buf^ := Tab[Buf^];
   inc(Buf);
   dec(Len);
 until Len = 0;
end;

But as you can see now, if value Len is 0 then procedure should corupt program memoray. 但是正如您现在所看到的,如果Len值为0,则程序将破坏程序注释。

... ...

This code looks better, because while loop test the 0 value and never execute the loop. 这段代码看起来更好,因为while循环测试0值并且从不执​​行循环。

procedure ExecuteAsm(Tab, Buf: PByte; Len: cardinal);
begin
  while Len > 0 do
  begin
    Buf^ := Tab[Buf^];
    inc(Buf);
    dec(Len);
  end;
end;

However, if you still like assembler you must preserve ebx/rbx register like... 但是,如果您仍然喜欢汇编器,则必须像下面那样保留ebx / rbx寄存器:

procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
    push    ebx   //rbx

//... your code

    pop     ebx   //rbx
end;

EDIT: Added 32 bit and 64 bit tests 编辑:添加了32位和64位测试

Because HeartWare didn't do homework by David Heffernan, I did. 因为HeartWare没有执行David Heffernan的作业,所以我做到了。 Original test made David Heffernan, look HeartWares comments. 最初的测试做了David Heffernan,请看HeartWares评论。 I have made just a little changes and added two more test cases. 我做了一些改动,并添加了另外两个测试用例。 This directive is inportant: {$O+} //Turn on compiler optimisation... :) 该指令很重要:{$ O +} //打开编译器优化... :)

{$APPTYPE CONSOLE}

uses
  Diagnostics;

 {$O+} //Turn on compiler optimisation... :)

procedure _asm_GJ(Tab, Buf : PByte; Len : Cardinal);
//    32-bit   eax edx           ecx
//    64-bit   rcx rdx           r8
asm
{$IFDEF CPUX64 }
        test    Len, Len
        jz      @exit
@loop:
        movzx   rax, [Buf]
        mov     al, byte ptr[Tab + rax]
        mov     [Buf],al
        inc     Buf
        dec     Len
        jnz     @loop
{$ELSE }
        test    Len, Len
        jz      @exit
        push    ebx
@loop:
        movzx   ebx, [Buf]
        mov     bl,byte ptr[Tab + ebx]
        mov     [Buf], bl
        inc     Buf
        dec     Len
        jnz     @loop
        pop     ebx
{$ENDIF }
@exit:
end;

procedure _asm_HeartWare(Tab, Buf : PByte; Len : Cardinal);
//  32-bit     EAX EDX           ECX
//  64-bit     RCX RDX           R8
asm
    {$IFDEF CPUX64 }
        XCHG    R8,RCX
        JECXZ   @OUT
        XOR     RAX,RAX
    @LOOP:
        MOV     AL,[RDX]
        MOV     AL,[R8+RAX]
        MOV     [RDX],AL
        INC     RDX
        DEC     ECX
        JNZ     @LOOP
        // LOOP @LOOP
    {$ELSE }
        JECXZ   @OUT
        PUSH    EBX
        XCHG    EAX,EBX
        XOR     EAX,EAX
    @LOOP:
        MOV     AL,[EDX+ECX-1]
        MOV     AL,[EBX+EAX]
        MOV     [EDX+ECX-1],AL
        DEC     ECX
        JNZ     @LOOP
        // LOOP @LOOP
        POP     EBX
    {$ENDIF }
    @OUT:
end;

procedure _pas_normal(Tab, Buf: PByte; Len: Cardinal);
begin
  while Len > 0 do begin
    Buf^ := Tab[Buf^];
    inc(Buf);
    dec(Len);
  end;
end;

procedure _pas_inline(Tab, Buf: PByte; Len: Cardinal); inline;
begin
  while Len > 0 do begin
    Buf^ := Tab[Buf^];
    inc(Buf);
    dec(Len);
  end;
end;

var
  Stopwatch: TStopwatch;
  i: Integer;
  x, y: array [0 .. 1023] of Byte;

procedure refresh;
begin
  for i := low(x) to high(x) do
  begin
    x[i] := i mod 256;
    y[i] := (i + 20) mod 256;
  end;
end;

begin
{$IFDEF CPUX64 }
  Writeln('64 bit mode');
{$ELSE }
  Writeln('32 bit mode');
{$ENDIF }
  refresh;
  Stopwatch := TStopwatch.StartNew;
  for i := 1 to 1000000 do
  begin
    _asm_HeartWare(PByte(@x), PByte(@y), SizeOf(x));
  end;
  Writeln('asm HeartWare : ', Stopwatch.ElapsedMilliseconds, 'ms');

  refresh;
  Stopwatch := TStopwatch.StartNew;
  for i := 1 to 1000000 do
  begin
    _asm_GJ(PByte(@x), PByte(@y), SizeOf(x));
  end;
  Writeln('asm GJ        : ', Stopwatch.ElapsedMilliseconds, 'ms');

  refresh;
  Stopwatch := TStopwatch.StartNew;
  for i := 1 to 1000000 do
  begin
    _pas_normal(PByte(@x), PByte(@y), SizeOf(x));
  end;
  Writeln('pas normal    : ', Stopwatch.ElapsedMilliseconds, 'ms');

  refresh;
  Stopwatch := TStopwatch.StartNew;
  for i := 1 to 1000000 do
  begin
    _pas_inline(PByte(@x), PByte(@y), SizeOf(x));
  end;
  Writeln('pas inline    : ', Stopwatch.ElapsedMilliseconds, 'ms');

  Readln;
end.

And results... 结果...

在此处输入图片说明

Cunclusion... 禁闭...

There is almost nothing to say! 几乎没有话要说! Numbers talk... 数字说话...

Delphi compiler is good, hmm very good! Delphi编译器很好,嗯很好!

I have built in test another asm optimisated procedure, because HeartWare asm optimisation isn't real optimisation. 我建立了测试另一个asm优化程序的程序,因为HeartWare asm优化不是真正的优化。

NOTE: Read the accepted answer by GJ as it contains a Pascal implementation that beats the crap out of my version (I seem to confuse the compiler by using ABSOLUTE to overcome the signature problem GJ's implementation has, which is one of the reasons why I didn't use it as the Pascal version, but even when recoded to match the signature and using explicit type casts within the routine, it was still much faster than my Pascal version, and on par with the optimized assembler version, so as stated in my own reply and all the others, use a Pascal implementation when possible, unless it is a time-critical routine called a gazillion times, and an actual benchmark shows that the ASM version is significantly faster - which (in my defense) my benchmark did show. 注意:请阅读GJ接受的答案,因为它包含Pascal实现,这使我的版本不合常规(我似乎通过使用ABSOLUTE来克服GJ实现所遇到的签名问题,从而使编译器感到困惑,这就是我没有这样做的原因之一不能将其用作Pascal版本,但是即使重新编码以匹配签名并在例程中使用显式类型强制转换,它仍然比我的Pascal版本快得多,并且与优化的汇编程序版本相当,正如我在《自己的回复以及所有其他回复,请尽可能使用Pascal实现,除非它是一个时间紧要的例程,称为“千亿次”, 并且实际的基准测试表明ASM版本要快得多-(以我的辩护)我的基准测试确实显示了。

{$IFDEF MSWINDOWS }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD); ASSEMBLER; Register;
  //      32-bit     EAX EDX             ECX
  //      64-bit     RCX RDX             R8
  ASM
    {$IFDEF CPUX64 }
        XCHG    R8,RCX
        JECXZ   @OUT
        XOR     RAX,RAX
    @LOOP:
        MOV     AL,[RDX]
        MOV     AL,[R8+RAX]
        MOV     [RDX],AL
        INC     RDX
        DEC     ECX
        JNZ     @LOOP
        // LOOP @LOOP
    {$ELSE }
        JECXZ   @OUT
        PUSH    EBX
        XCHG    EAX,EBX
        XOR     EAX,EAX
    @LOOP:
        MOV     AL,[EDX+ECX-1]
        MOV     AL,[EBX+EAX]
        MOV     [EDX+ECX-1],AL
        DEC     ECX
        JNZ     @LOOP
        // LOOP @LOOP
        POP     EBX
    {$ENDIF }
    @OUT:
  END;
{$ELSE }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD);
  VAR
    TabP    : PByte ABSOLUTE Tab;
    BufP    : PByte ABSOLUTE Buf;
    I       : Cardinal;

  BEGIN
    FOR I:=1 TO Len DO BEGIN
      BufP^:=TabP[BufP^];
      INC(BufP)
    END
  END;
{$ENDIF }

This should be a valid substitution for all currently supported compilers and platforms. 这应该是所有当前支持的编译器和平台的有效替代。 While I agree that it might be better to use the pure Pascal version, it does lead to some horrendous assembly code with lots of unnecessary reloading of registers (at least in 32-bit), so the pure assembly version is definitely faster. 虽然我同意使用纯Pascal版本可能会更好,但它确实导致了一些可怕的汇编代码,并且不必要地重新加载了寄存器(至少在32位中),因此,纯汇编版本肯定更快。

However, unless you run it like a gazillion times, you probably won't notice it in actual use, and the pure Pascal routine will most likely perform adequately. 但是,除非您将其运行的次数惊人,否则在实际使用中您可能不会注意到它,并且纯Pascal例程很可能会充分发挥作用。 However, only you can determine if the speed improvement is necessary. 但是,只有您可以确定是否有必要提高速度。

Anyway, here are the timings for executing the PROCEDURE 100.000 times on a 256 byte array (using XE5): 无论如何,这是在256字节数组上(使用XE5)执行100.000次PROCEDURE的时间:

32-bit ASM: 47 ms
64-bit ASM: 47 ms
32-bit PAS: 63 ms
64-bit PAS: 78 ms

and the timings for running it 10.000.000 times in RELEASE configuration: 以及在RELEASE配置中将其运行10.000.000次的时间:

32-bit ASM: 5281 ms
64-bit ASM: 5281 ms
32-bit PAS: 7765 ms
64-bit PAS: 10031 ms

Still, however, the ASM version beats out the Pascal version in all cases... 但是,在任何情况下,ASM版本都胜过Pascal版本...

And the hand-optimized assembly version performed even better: 手动优化的装配体版本表现更好:

32-bit ASM: 1906 ms
64-bit ASM: 1859 ms
32-bit PAS: 7781 ms
64-bit PAS: 10015 ms

And with 10.000 times 25.600 bytes instead: 并用10.000乘以25.600字节:

32-bit ASM: 218 ms
64-bit ASM: 172 ms
32-bit PAS: 734 ms
64-bit PAS: 937 ms

In ALL cases, my ASM routine beats the crap out of the compiler's. 在所有情况下,我的ASM例程都胜过了编译器。 I simply can't reproduce your timings... What code and compiler did you use? 我根本无法复制您的时间安排...您使用了什么代码和编译器?

The actual code that computes the time is as follows (for the 10.000 times 25.600 bytes): 计算时间的实际代码如下(对于10.000乘以25.600字节):

T:=GetTickCount;
FOR I:=1 TO 10000 DO ExecuteAsm(TAB,BUF,25600);
T:=GetTickCount-T;

Absolutely not sure that it will work correctly but it compiles successfully: 绝对不确定它是否可以正常运行,但可以成功编译:

procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
     mov   rbx, Tab
     mov   ecx, Len
     mov   rdx, Buf
@1:  mov   al,  [rdx]
     xlat
     mov   [rdx], al
     inc   rdx
     dec   ecx
     jnz @1
end;

Is it the correct answer? 这是正确的答案吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM