[英]A simple while-loop in GCC inline assembly
I want to write the following loop using GCC extended inline ASM: 我想使用GCC扩展内联ASM编写以下循环:
long* arr = new long[ARR_LEN]();
long* act_ptr = arr;
long* end_ptr = arr + ARR_LEN;
while (act_ptr < end_ptr)
{
*act_ptr = SOME_VALUE;
act_ptr += STEP_SIZE;
}
delete[] arr;
An array of type long
with length ARR_LEN
is allocated and zero-initialized. 分配长度为
ARR_LEN
long
类型的数组,并将其初始化为零。 The loop walks through the array with an increment of STEP_SIZE
. 循环以
STEP_SIZE
的增量STEP_SIZE
数组。 Every touched element is set to SOME_VALUE
. 每个触摸的元素都设置为
SOME_VALUE
。
Well, this was my first attempt in GAS: 好吧,这是我第一次在GAS中尝试:
long* arr = new long[ARR_LEN]();
asm volatile
(
"loop:"
"movl %[sval], (%[aptr]);"
"leal (%[aptr], %[incr], 4), %[aptr];"
"cmpl %[eptr], %[aptr];"
"jl loop;"
: // no output
: [aptr] "r" (arr),
[eptr] "r" (arr + ARR_LEN),
[incr] "r" (STEP_SIZE),
[sval] "i" (SOME_VALUE)
: "cc", "memory"
);
delete[] arr;
As mentioned in the comments, it is true that this assembler code is more of a do {...} while
loop, but it does in fact do the same work. 如注释中所述,确实,此汇编代码更像是
do {...} while
循环,但实际上它执行相同的工作。
The strange thing about that piece of code really is, that it worked fine for me at first. 那段代码的真正奇怪之处在于,一开始它对我来说效果很好。 But when I later tried to make it work in another project, it just seemed as if it wouldn't do anything.
但是当我后来试图使其在另一个项目中工作时,似乎似乎什么也做不了。 I even made some 1:1 copies of the working project, compiled again and... still the result is random.
我什至制作了工作项目的1:1副本,再次进行编译,...结果仍然是随机的。
Maybe I took the wrong constraints for the input operands, but I've actually tried nearly all of them by now and I have no real idea left. 也许我对输入操作数使用了错误的约束,但是到目前为止,我实际上已经尝试了几乎所有输入操作数,而且我还没有真正的想法。 What puzzles me in particular is, that it still works in some cases.
特别令我困惑的是,它在某些情况下仍然有效。
I am not an expert at ASM whatsoever, although I learned it when I was still at university. 尽管我在上大学时就已经学过,但我并不是ASM的专家。 Please note that I am not looking for optimization - I am just trying to understand how inline assembly works.
请注意,我不是在寻求优化-我只是想了解内联汇编的工作方式。 So here is my question: Is there anything fundamentally wrong with my attempt or did I make a more subtle mistake here?
所以这是我的问题:我的尝试有根本性的错误吗?还是我在这里犯了一个更细微的错误? Thanks in advance.
提前致谢。
(Working with g++ MinGW Win32 x86 v.4.8.1) (使用g ++ MinGW Win32 x86 v.4.8.1)
Update 更新资料
I have already tried out every single suggestion that has been contributed here so far. 到目前为止,我已经尝试了所有在此提出的建议。 In particular I tried
我特别尝试过
... : [aptr] "=r" (arr) : "0" (arr) ...
instead, same result, ... : [aptr] "=r" (arr) : "0" (arr) ...
相反,相同的结果, ... : [aptr] "+r" (arr) : ...
, still the same. ... : [aptr] "+r" (arr) : ...
仍然相同。 Meanwhile I know the official documentation pretty much by heart, but I still can't see my error. 同时,我非常了解官方文档 ,但仍然看不到我的错误。
You are modifying an input operand ( aptr
) which is not allowed. 您正在修改不允许的输入操作数(
aptr
)。 Either constrain it match an output operand or change it to an input/output operand. 约束它与输出操作数匹配或将其更改为输入/输出操作数。
Here is a complete code that has the intended behavior. 这是具有预期行为的完整代码。
%%rbx
is used instead of %%ebx
as the base address for the array. %%rbx
代替%%ebx
作为阵列的基地址。 For the same reason leaq
and cmpq
should be used instead of leal
and cmpl
. leaq
和cmpq
代替leal
和cmpl
。 movq
should be used since the array is of type long
. long
应使用movq
。 long
is 8 byte not 4 byte on a 64-bit machine. long
类型是8字节而不是4字节。 jl
in the question should be changed to jg
. jl
应该更改为jg
。 ebx
). ebx
)替换它们。 Constraint "r"
can not be used. 不能使用约束
"r"
。 "r"
means any register can be used, however not any combination of registers is acceptable for leaq
. "r"
表示可以使用任何寄存器,但是leaq
不能接受任何寄存器组合。 Look here: x86 addressing modes 在这里看: x86寻址模式
#include <iostream> using namespace std; int main(){ int ARR_LEN=20; int STEP_SIZE=2; long SOME_VALUE=100; long* arr = new long[ARR_LEN]; int i; for (i=0; i<ARR_LEN; i++){ arr[i] = 0; } __asm__ __volatile__ ( "loop:" "movq %%rdx, (%%rbx);" "leaq (%%rbx, %%rcx, 8), %%rbx;" "cmpq %%rbx, %%rax;" "jg loop;" : // no output : "b" (arr), "a" (arr+ARR_LEN), "c" (STEP_SIZE), "d" (SOME_VALUE) : "cc", "memory" ); for (i=0; i<ARR_LEN; i++){ cout << "element " << i << " is " << arr[i] << endl; } delete[] arr; return 0; }
How about an answer that works for both x86 and x64 (although it does assume longs are always 4 bytes, a la Windows)? 对于x86和x64都有效的答案如何(尽管它确实假设long始终为4字节,例如Windows)? The main change from the OP is using "+r" and (temp).
OP的主要更改是使用“ + r”和(温度)。
#include <iostream>
using namespace std;
int main(){
int ARR_LEN=20;
size_t STEP_SIZE=2;
long SOME_VALUE=100;
long* arr = new long[ARR_LEN];
for (int i=0; i<ARR_LEN; i++){
arr[i] = 0;
}
long* temp = arr;
asm volatile (
"loop:\n\t"
"movl %[sval], (%[aptr])\n\t"
"lea (%[aptr], %[incr], %c[size]), %[aptr]\n\t"
"cmp %[eptr], %[aptr]\n\t"
"jl loop\n\t"
: [aptr] "+r" (temp)
: [eptr] "r" (arr + ARR_LEN),
[incr] "r" (STEP_SIZE),
[sval] "i" (SOME_VALUE),
[size] "i" (sizeof(long))
: "cc", "memory"
);
for (int i=0; i<ARR_LEN; i++){
cout << "element " << i << " is " << arr[i] << endl;
}
delete[] arr;
return 0;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.