[英]Assembler function on 64-bit platform on Delphi
I have the following function and need to make it compatible with 64-bit platform: 我具有以下功能,需要使其与64位平台兼容:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
mov ebx, Tab
mov ecx, Len
mov edx, Buf
@1: mov al, [edx]
xlat
mov [edx], al
inc edx
dec ecx
jnz @1
end;
Delphi XE5 raises error [dcc64 Error] E2107 Operand size mismatch
on the lines with Tab
and Len
parameters. Delphi XE5引发错误
[dcc64 Error] E2107 Operand size mismatch
具有Tab
和Len
参数的行上的[dcc64 Error] E2107 Operand size mismatch
。
Unfortunately I don't know assembler enough to fix the issue myself. 不幸的是,我对汇编器的了解不足以自己解决问题。 What should I change to successfully compile the function?
要成功编译该功能,我应该更改什么?
That assembly code is essentially just doing the following, which would work in both 32bit and 64bit: 该汇编代码实际上只是在执行以下操作,该操作可以在32位和64位上运行:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
var
pBuf: PByte;
begin
pBuf := PByte(Buf);
repeat
pBuf^ := PByte(Tab)[pBuf^];
Inc(pBuf);
Dec(Len);
until Len = 0;
end;
So why not just use plain Delphi code and let the compiler deal with the assembly? 那么,为什么不只使用普通的Delphi代码并让编译器处理程序集呢?
Why you are using assembler? 为什么使用汇编器?
There is no good reason! 没有充分的理由!
This is direct translarion of your asm code to Delphi pascal: 这是将您的asm代码直接翻译成Delphi pascal的代码:
procedure ExecuteAsm(Tab, Buf: PByte; Len: DWORD);
repeat
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
until Len = 0;
end;
But as you can see now, if value Len
is 0 then procedure should corupt program memoray. 但是正如您现在所看到的,如果
Len
值为0,则程序将破坏程序注释。
... ...
This code looks better, because while
loop test the 0 value and never execute the loop. 这段代码看起来更好,因为
while
循环测试0值并且从不执行循环。
procedure ExecuteAsm(Tab, Buf: PByte; Len: cardinal);
begin
while Len > 0 do
begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
However, if you still like assembler you must preserve ebx/rbx register like... 但是,如果您仍然喜欢汇编器,则必须像下面那样保留ebx / rbx寄存器:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
push ebx //rbx
//... your code
pop ebx //rbx
end;
EDIT: Added 32 bit and 64 bit tests 编辑:添加了32位和64位测试
Because HeartWare didn't do homework by David Heffernan, I did. 因为HeartWare没有执行David Heffernan的作业,所以我做到了。 Original test made David Heffernan, look HeartWares comments.
最初的测试做了David Heffernan,请看HeartWares评论。 I have made just a little changes and added two more test cases.
我做了一些改动,并添加了另外两个测试用例。 This directive is inportant: {$O+} //Turn on compiler optimisation... :)
该指令很重要:{$ O +} //打开编译器优化... :)
{$APPTYPE CONSOLE}
uses
Diagnostics;
{$O+} //Turn on compiler optimisation... :)
procedure _asm_GJ(Tab, Buf : PByte; Len : Cardinal);
// 32-bit eax edx ecx
// 64-bit rcx rdx r8
asm
{$IFDEF CPUX64 }
test Len, Len
jz @exit
@loop:
movzx rax, [Buf]
mov al, byte ptr[Tab + rax]
mov [Buf],al
inc Buf
dec Len
jnz @loop
{$ELSE }
test Len, Len
jz @exit
push ebx
@loop:
movzx ebx, [Buf]
mov bl,byte ptr[Tab + ebx]
mov [Buf], bl
inc Buf
dec Len
jnz @loop
pop ebx
{$ENDIF }
@exit:
end;
procedure _asm_HeartWare(Tab, Buf : PByte; Len : Cardinal);
// 32-bit EAX EDX ECX
// 64-bit RCX RDX R8
asm
{$IFDEF CPUX64 }
XCHG R8,RCX
JECXZ @OUT
XOR RAX,RAX
@LOOP:
MOV AL,[RDX]
MOV AL,[R8+RAX]
MOV [RDX],AL
INC RDX
DEC ECX
JNZ @LOOP
// LOOP @LOOP
{$ELSE }
JECXZ @OUT
PUSH EBX
XCHG EAX,EBX
XOR EAX,EAX
@LOOP:
MOV AL,[EDX+ECX-1]
MOV AL,[EBX+EAX]
MOV [EDX+ECX-1],AL
DEC ECX
JNZ @LOOP
// LOOP @LOOP
POP EBX
{$ENDIF }
@OUT:
end;
procedure _pas_normal(Tab, Buf: PByte; Len: Cardinal);
begin
while Len > 0 do begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
procedure _pas_inline(Tab, Buf: PByte; Len: Cardinal); inline;
begin
while Len > 0 do begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
var
Stopwatch: TStopwatch;
i: Integer;
x, y: array [0 .. 1023] of Byte;
procedure refresh;
begin
for i := low(x) to high(x) do
begin
x[i] := i mod 256;
y[i] := (i + 20) mod 256;
end;
end;
begin
{$IFDEF CPUX64 }
Writeln('64 bit mode');
{$ELSE }
Writeln('32 bit mode');
{$ENDIF }
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_asm_HeartWare(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('asm HeartWare : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_asm_GJ(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('asm GJ : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_pas_normal(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('pas normal : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_pas_inline(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('pas inline : ', Stopwatch.ElapsedMilliseconds, 'ms');
Readln;
end.
And results... 结果...
Cunclusion... 禁闭...
There is almost nothing to say! 几乎没有话要说! Numbers talk...
数字说话...
Delphi compiler is good, hmm very good! Delphi编译器很好,嗯很好!
I have built in test another asm optimisated procedure, because HeartWare asm optimisation isn't real optimisation. 我建立了测试另一个asm优化程序的程序,因为HeartWare asm优化不是真正的优化。
NOTE: Read the accepted answer by GJ as it contains a Pascal implementation that beats the crap out of my version (I seem to confuse the compiler by using ABSOLUTE to overcome the signature problem GJ's implementation has, which is one of the reasons why I didn't use it as the Pascal version, but even when recoded to match the signature and using explicit type casts within the routine, it was still much faster than my Pascal version, and on par with the optimized assembler version, so as stated in my own reply and all the others, use a Pascal implementation when possible, unless it is a time-critical routine called a gazillion times, and an actual benchmark shows that the ASM version is significantly faster - which (in my defense) my benchmark did show. 注意:请阅读GJ接受的答案,因为它包含Pascal实现,这使我的版本不合常规(我似乎通过使用ABSOLUTE来克服GJ实现所遇到的签名问题,从而使编译器感到困惑,这就是我没有这样做的原因之一不能将其用作Pascal版本,但是即使重新编码以匹配签名并在例程中使用显式类型强制转换,它仍然比我的Pascal版本快得多,并且与优化的汇编程序版本相当,正如我在《自己的回复以及所有其他回复,请尽可能使用Pascal实现,除非它是一个时间紧要的例程,称为“千亿次”, 并且实际的基准测试表明ASM版本要快得多-(以我的辩护)我的基准测试确实显示了。
{$IFDEF MSWINDOWS }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD); ASSEMBLER; Register;
// 32-bit EAX EDX ECX
// 64-bit RCX RDX R8
ASM
{$IFDEF CPUX64 }
XCHG R8,RCX
JECXZ @OUT
XOR RAX,RAX
@LOOP:
MOV AL,[RDX]
MOV AL,[R8+RAX]
MOV [RDX],AL
INC RDX
DEC ECX
JNZ @LOOP
// LOOP @LOOP
{$ELSE }
JECXZ @OUT
PUSH EBX
XCHG EAX,EBX
XOR EAX,EAX
@LOOP:
MOV AL,[EDX+ECX-1]
MOV AL,[EBX+EAX]
MOV [EDX+ECX-1],AL
DEC ECX
JNZ @LOOP
// LOOP @LOOP
POP EBX
{$ENDIF }
@OUT:
END;
{$ELSE }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD);
VAR
TabP : PByte ABSOLUTE Tab;
BufP : PByte ABSOLUTE Buf;
I : Cardinal;
BEGIN
FOR I:=1 TO Len DO BEGIN
BufP^:=TabP[BufP^];
INC(BufP)
END
END;
{$ENDIF }
This should be a valid substitution for all currently supported compilers and platforms. 这应该是所有当前支持的编译器和平台的有效替代。 While I agree that it might be better to use the pure Pascal version, it does lead to some horrendous assembly code with lots of unnecessary reloading of registers (at least in 32-bit), so the pure assembly version is definitely faster.
虽然我同意使用纯Pascal版本可能会更好,但它确实导致了一些可怕的汇编代码,并且不必要地重新加载了寄存器(至少在32位中),因此,纯汇编版本肯定更快。
However, unless you run it like a gazillion times, you probably won't notice it in actual use, and the pure Pascal routine will most likely perform adequately. 但是,除非您将其运行的次数惊人,否则在实际使用中您可能不会注意到它,并且纯Pascal例程很可能会充分发挥作用。 However, only you can determine if the speed improvement is necessary.
但是,只有您可以确定是否有必要提高速度。
Anyway, here are the timings for executing the PROCEDURE 100.000 times on a 256 byte array (using XE5): 无论如何,这是在256字节数组上(使用XE5)执行100.000次PROCEDURE的时间:
32-bit ASM: 47 ms
64-bit ASM: 47 ms
32-bit PAS: 63 ms
64-bit PAS: 78 ms
and the timings for running it 10.000.000 times in RELEASE configuration: 以及在RELEASE配置中将其运行10.000.000次的时间:
32-bit ASM: 5281 ms
64-bit ASM: 5281 ms
32-bit PAS: 7765 ms
64-bit PAS: 10031 ms
Still, however, the ASM version beats out the Pascal version in all cases... 但是,在任何情况下,ASM版本都胜过Pascal版本...
And the hand-optimized assembly version performed even better: 手动优化的装配体版本表现更好:
32-bit ASM: 1906 ms
64-bit ASM: 1859 ms
32-bit PAS: 7781 ms
64-bit PAS: 10015 ms
And with 10.000 times 25.600 bytes instead: 并用10.000乘以25.600字节:
32-bit ASM: 218 ms
64-bit ASM: 172 ms
32-bit PAS: 734 ms
64-bit PAS: 937 ms
In ALL cases, my ASM routine beats the crap out of the compiler's. 在所有情况下,我的ASM例程都胜过了编译器。 I simply can't reproduce your timings... What code and compiler did you use?
我根本无法复制您的时间安排...您使用了什么代码和编译器?
The actual code that computes the time is as follows (for the 10.000 times 25.600 bytes): 计算时间的实际代码如下(对于10.000乘以25.600字节):
T:=GetTickCount;
FOR I:=1 TO 10000 DO ExecuteAsm(TAB,BUF,25600);
T:=GetTickCount-T;
Absolutely not sure that it will work correctly but it compiles successfully: 绝对不确定它是否可以正常运行,但可以成功编译:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
mov rbx, Tab
mov ecx, Len
mov rdx, Buf
@1: mov al, [rdx]
xlat
mov [rdx], al
inc rdx
dec ecx
jnz @1
end;
Is it the correct answer? 这是正确的答案吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.