简体   繁体   English

at&t asm内联C ++问题

[英]at&t asm inline c++ problem

My Code 我的密码

const int howmany = 5046;
char buffer[howmany];
    asm("lea     buffer,%esi"); //Get the address of buffer
    asm("mov     howmany,%ebx");         //Set the loop number
    asm("buf_loop:");                      //Lable for beginning of loop
    asm("movb     (%esi),%al");             //Copy buffer[x] to al
    asm("inc     %esi");                   //Increment buffer address
    asm("dec     %ebx");                   //Decrement loop count
    asm("jnz     buf_loop");              //jump to buf_loop if(ebx>0)

My Problem 我的问题

I am using the gcc compiler. 我正在使用gcc编译器。 For some reason my buffer/howmany variables are undefined in the eyes of my asm. 由于某种原因,我的asm看不到我的buffer / howmany变量的定义。 I'm not sure why. 我不知道为什么。 I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register. 我只想将缓冲区数组的起始地址移入esi寄存器,将其循​​环“多次”,同时将每个元素复制到al寄存器。

Are you using the inline assembler in gcc? 您是否在gcc中使用内联汇编器? (If not, in what other C++ compiler, exactly?) (如果没有,那么究竟是在其他C ++编译器中?)

If gcc, see the details here , and in particular this example: 如果是gcc,请在此处查看详细信息,尤其是此示例:

    asm ("leal (%1,%1,4), %0"
         : "=r" (five_times_x)
         : "r" (x) 
         );

%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm . %0%1引用了C级变量,它们专门作为asm的第二个参数(对于输出)和第三个参数(对于输入)列出。 In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */ , to indicate that more explicitly). 在您的示例中,您只有“输入”,因此您将有一个空的第二个操作数(传统上,该操作数在该冒号后面使用注释,例如/* no output registers */ ,以更明确地指示该内容)。

The part that declares an array like that 像这样声明数组的部分

int howmany = 5046;
char buffer[howmany];

is not valid C++. 无效的C ++。 In C++ it is impossible to declare an array that has "variable" or run-time size. 在C ++中,不可能声明具有“变量”或运行时大小的数组。 In C++ array declarations the size is always a compile-time constant. 在C ++数组声明中,大小始终是编译时常量。

If your compiler allows this array declaration, it means that it implements it as an extension. 如果您的编译器允许此数组声明,则意味着它将其实现为扩展。 In that case you have to do your own research to figure out how it implements such a run-time sized array internally. 在这种情况下,您必须进行自己的研究才能弄清楚它如何在内部实现这种运行时大小的数组。 I would guess that internally buffer will be implemented as a pointer , not as a true array. 我猜想内部buffer将实现为指针 ,而不是真正的数组。 If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be 如果我的猜测是正确的并且确实是一个指针,那么将数组地址加载到esi的正确方法可能是

mov buffer,%esi

and not a lea , as in your code. 而不是代码中的lea lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays. lea仅适用于“常规”编译时大小的数组,而不适用于运行时大小的数组。

Another question is whether you really need a run-time sized array in your code. 另一个问题是您的代码中是否真的需要运行时大小的数组。 Could it be that you just made it so by mistake? 难道是您只是错误地做到了? If you simply change the howmany declaration to 如果您只是将howmany声明更改为

const int howmany = 5046;

the array will turn into an "normal" C++ array and your code might start working as is (ie with lea ). 该数组将变成一个“常规” C ++数组,并且您的代码可能会按原样开始工作(即使用lea )。

All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers. 如果您想确保它们是连续的(它们之间没有编译器生成的代码),则所有这些asm指令都必须位于同一 asm语句中,并且您需要声明input / output / clobber操作数,否则您将踩到编译器的寄存器。

You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't). 您不能在C变量名称中使用leamov (从全局或静态符号中除外,它们实际上是在编译器的asm输出中定义的,但即使这样,通常也不应这样做)。

Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. 与其使用mov指令来设置输入,不如要求编译器使用输入操作数约束为您完成此操作。 If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code. 如果是GNU C内联asm语句的第一条指令或最后一条指令,通常意味着您做错了并且编写了无效的代码。

And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non- const and even set in a way that doesn't optimize away to a constant. 和顺便说一句,GNU C ++允许C99式可变长度数组,所以howmany允许是非const ,甚至在不优化掉至一个恒定的方式设置。 Any compiler that can compile GNU-style inline asm will also support variable-length arrays. 任何可以编译GNU风格的嵌入式asm的编译器也将支持可变长度数组。


How to write your loop properly 如何正确编写循环

If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm . 如果这看起来过于复杂,则请https://gcc.gnu.org/wiki/DontUseInlineAsm Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. 在asm中编写一个独立的函数,这样您就可以学习asm,而不必学习gcc及其复杂但功能强大的inline-asm接口。 You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled). 基本上,您必须了解asm并了解编译器才能正确使用它(具有正确的约束条件,以防止在启用优化时损坏)。

Note the use of named operands like %[ptr] instead of %2 or %%ebx . 请注意,使用诸如%[ptr]类的命名操作数代替%2%%ebx Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. 通常让编译器选择要使用的寄存器是一件好事,但是对于x86,您可以使用除"r"以外的其他字母,例如专门用于rax / eax / ax / al的"=a" See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html , and also other links in the inline-assembly tag wiki . 请参阅https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html ,以及inline-assembly标签Wiki中的其他链接。

I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble. 我还使用了buf_loop%=:在标签上附加了一个唯一的数字,因此,如果优化程序克隆该函数或将其内联到多个位置,则文件仍会汇编。

Source + compiler asm output on the Godbolt compiler explorer . Godbolt编译器资源管理器上的 Source +编译器asm输出

void ext(char *);

int foo(void) 
{
    int howmany = 5046;   // could be a function arg
    char buffer[howmany];
    //ext(buffer);

    const char *bufptr = buffer;  // copy the pointer to a C var we can use as a read-write operand
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       :   [res]"=a"(result)      // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , [ptr] "+r" (bufptr)
       : // no input-only operands
       : "memory"   // we read memory that isn't an input operand, only pointed to by inputs
    );
    return result;
}

I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. 我以%%al为例说明了如何显式写入寄存器名称:扩展Asm(带有操作数)需要使用double %才能在asm输出中获取原义% You could also use %[res] or %0 and let the compiler substitute %al in its asm output. 您也可以使用%[res]%0并让编译器在其asm输出中替换%al (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char , so the compiler will pick a byte register for it. (然后,除非您想利用cbwlodsb或类似的东西,否则您没有理由使用特定寄存器约束。) resultunsigned char ,因此编译器将为其选择一个字节寄存器。 If you want the low byte of a wider operand, you could use %b[count] for example. 如果要使用较宽的操作数的低字节,则可以使用%b[count]例如。

This uses a "memory" clobber, which is inefficient . 这使用了效率低下的"memory"缓冲区 You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. 您不需要编译器将所有内容溢出到内存中,只需确保内存中buffer[]的内容与C抽象机状态匹配即可。 (This is not guaranteed by passing a pointer to it in a register). (这不是由指针传递到它在寄存器保证)。

gcc7.2 -O3 output: gcc7.2 -O3输出:

    pushq   %rbp
    movl    $5046, %edx
    movq    %rsp, %rbp
    subq    $5056, %rsp
    movq    %rsp, %rcx         # compiler-emitted to satisfy our "+r" constraint for bufptr
    # start of the inline-asm block
    buf_loop18:  
       movb     (%rcx), %al 
       inc     %rcx        
       dec     %edx      
       jnz     buf_loop      
    # end of the inline-asm block

    movzbl  %al, %eax
    leave
    ret

Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. 在没有内存破坏者或输入约束的情况下, leave出现嵌入式asm块之前,在嵌入式asm使用当前失效的指针之前释放该堆栈内存。 A signal-handler running at the wrong time would clobber it. 在错误的时间运行的信号处理程序会破坏它。


A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. 一种更有效的方法是使用虚拟内存操作数,该操作数告诉编译器整个数组是asm语句的只读内存输入。 See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly. 有关此flexible-array-member技巧的更多信息,请参见内联GNU汇编器中的获取字符串长度,该技巧可告诉编译器您在不显式指定长度的情况下读取了所有数组。

In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand. 在C语言中,您可以在强制类型转换中定义一个新类型,但在C ++语言中则不能,因此using而不是真正复杂的输入操作数。

int bar(unsigned howmany)
{
    //int howmany = 5046;
    char buffer[howmany];
    //ext(buffer);
    buffer[0] = 1;
    buffer[100] = 100;   // test whether we got the input constraints right

    //using input_t = const struct {char a[howmany];};  // requires a constant size
    using flexarray_t = const struct {char a; char x[];};
    const char *dummy;
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       : [res]"=a"(result)        // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , "=r" (dummy)           // output operand in the same register as buffer input, so we can modify the register
       : [ptr] "2" (buffer)     // matching constraint for the dummy output
         , "m" (*(flexarray_t *) buffer)  // whole buffer as an input operand

           //, "m" (*buffer)        // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
       : // no clobbers
    );
    buffer[100] = 101;
    return result;
}

I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. 我还使用了匹配约束,因此buffer可以直接作为输入,并且同一寄存器中的输出操作数意味着我们可以修改该寄存器。 We got the same effect in foo() by using const char *bufptr = buffer; 通过使用const char *bufptr = buffer;我们在foo()获得了相同的效果const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. 然后使用读写约束来告诉编译器该C变量的新值就是我们在寄存器中保留的值。 Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.) 无论哪种方式,我们都将一个值保留在一个死C变量中,该变量超出范围而不会被读取,但是匹配约束方法对于您不希望修改输入值(并且不需要修改输入值)的宏很有用。输入的类型: int dummy也可以。)

The buffer[100] = 100; buffer[100] = 100; and buffer[100] = 101; 并且buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). 分配表明它们都出现在asm中,而不是在inline-asm中合并(如果省略了"m"输入操作数,就会发生这种情况)。 IDK why the buffer[100] = 101; IDK为什么buffer[100] = 101; isn't optimized away; 没有被优化掉; it's dead so it should be. 它已经死了,应该如此。 Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints. 还要注意, asm volatile 不会阻止此重新排序,因此它不是"memory"破坏"memory"的替代方案,也不是使用正确的约束条件的替代方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM