如何从内联asm访问C结构/变量？

Question

Consider the following code: 考虑以下代码：

    int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
  {
    uint32 q, m;        /* Division Result */
    uint32 i;           /* Loop Counter */
    uint32 j;           /* Loop Counter */

    /* Check Input */
    if (bn1 == NULL) return(EFAULT);
    if (bn1->dat == NULL) return(EFAULT);
    if (bn2 == NULL) return(EFAULT);
    if (bn2->dat == NULL) return(EFAULT);
    if (bnr == NULL) return(EFAULT);
    if (bnr->dat == NULL) return(EFAULT);


    #if defined(__i386__) || defined(__amd64__)
    __asm__ (".intel_syntax noprefix");
    __asm__ ("pushl %eax");
    __asm__ ("pushl %edx");
    __asm__ ("pushf");
    __asm__ ("movl %eax, (bn1->dat[i])");
    __asm__ ("xorl %edx, %edx");
    __asm__ ("divl (bn2->dat[j])");
    __asm__ ("movl (q), %eax");
    __asm__ ("movl (m), %edx");
    __asm__ ("popf");
    __asm__ ("popl %edx");
    __asm__ ("popl %eax");
    #else
    q = bn->dat[i] / bn->dat[j];
    m = bn->dat[i] % bn->dat[j];
    #endif
    /* Return */
    return(0);
  }

The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. 数据类型uint32基本上是无符号long int或uint32_t无符号32位整数。 The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. bnint类型可以是无符号short int（uint16_t）或uint32_t，具体取决于是否可以使用64位数据类型。 If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. 如果有64位可用，则bnint为uint32，否则为uint16。 This was done in order to capture carry/overflow in other parts of the code. 这样做是为了捕获代码其他部分的进位/溢出。 The structure bn_t is defined as follows: 结构bn_t定义如下：

typedef struct bn_data_t bn_t;
struct bn_data_t
  {
    uint32 sz1;         /* Bit Size */
    uint32 sz8;         /* Byte Size */
    uint32 szw;         /* Word Count */
    bnint *dat;         /* Data Array */
    uint32 flags;       /* Operational Flags */
  };

The function starts on line 300 in my source code. 该函数从我的源代码的第300行开始。 So when I try to compile/make it, I get the following errors: 因此，当我尝试编译/制作它时，出现以下错误：

system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include  -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes  -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal  -Winline -Wunknown-pragmas -Wundef -Wendif-labels  -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
           ^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
              ^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
    uint32 i;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
    uint32 j;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
    __asm__ ("movl %eax, (bn1->dat[i])");
             ^
<inline asm>:1:18: note: instantiated into assembly here
        movl %eax, (bn1->dat[i])
                        ^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
    __asm__ ("divl (bn2->dat[j])");
             ^
<inline asm>:1:12: note: instantiated into assembly here
        divl (bn2->dat[j])
                  ^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1

Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->

What I know: 我知道的：

I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). 我认为自己对x86汇编程序非常精通（从我上面编写的代码可以证明）。 However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). 但是，大约15到20年前，当我为游戏编写图形驱动程序时（Windows 95之前的时代），我最后一次使用高级语言和汇编程序进行混合是使用Borland Pascal。 My familiarity is with Intel syntax. 我熟悉Intel语法。

What I don't know: 我不知道的是：

How do I access members of bn_t (especially *dat) from asm? 如何从asm访问bn_t（特别是* dat）的成员？ Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]). 由于* dat是指向uint32的指针，因此我以数组形式访问元素（例如bn1-> dat [i]）。

How do I access local variables that are declared on the stack? 如何访问在堆栈上声明的局部变量？

I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. 我正在使用push / pop将受破坏的寄存器恢复为以前的值，以免破坏编译器。 However, do I also need to include the volatile keyword on the local variables as well? 但是，我还需要在局部变量上也包含volatile关键字吗？

Or, is there a better way that I am not aware of? 还是有我不知道的更好的方法？ I don't want to put this in a separate function call because of the calling overhead as this function is performance critical. 由于此调用的性能至关重要，因此我不想将其放在单独的函数调用中。

Additional: 额外：

Right now, I'm just starting to write this function so it is no where complete. 现在，我才刚刚开始编写此函数，因此尚不完整。 There are missing loops and other such support/glue code. 缺少循环和其他此类支持/胶水代码。 But, the main gist is accessing local variables/structure elements. 但是，主要要点是访问局部变量/结构元素。

EDIT 1: 编辑1：

The syntax that I am using seems to be the only one that clang supports. 我使用的语法似乎是clang支持的唯一语法。 I tried the following code and clang gave me all sorts of errors: 我尝试了以下代码，但是clang给了我各种各样的错误：

__asm__ ("pushl %%eax",
    "pushl %%edx",
    "pushf",
    "movl (bn1->dat[i]), %%eax",
    "xorl %%edx, %%edx",
    "divl ($0x0c + bn2 + j)",
    "movl %%eax, (q)",
    "movl %%edx, (m)",
    "popf",
    "popl %%edx",
    "popl %%eax"
    );

It wants me to put a closing parenthesis on the first line, replacing the comma. 它希望我在第一行上加上一个圆括号，以代替逗号。 I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence. 我之所以改用%%而不是％，是因为我读到内联汇编需要%%表示CPU寄存器的地方，而clang告诉我我正在使用无效的转义序列。

Answer 1

If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div , which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer : 如果只需要32b / 32b => 32bit除法，那么让编译器使用div两个输出 ，就像在Godbolt编译器浏览器上看到的那样，gcc，clang和icc都可以正常工作：

uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];

Compilers are quite good at CSE ing that into one div . 编译器是在相当不错的CSE荷兰国际集团说成一个div 。 Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder. 只要确保您没有将除法结果存储在gcc无法证明不会影响余数输入的位置即可。

eg *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j] , so gcc would have to reload the operands and redo the div for the % operation. 例如*m = dat[i] / dat[j]可能重叠（别名） dat[i]或dat[j] ，因此gcc必须重新加载操作数并为%操作重做div 。 See the godbolt link for bad/good examples. 参见Godbolt链接以获取不良/良好示例。

Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link). 对于32bit / 32bit = 32bit div使用内联asm不会获得任何好处，实际上会使使用clang的代码更糟（请参阅godbolt链接）。

If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. 如果您没有64位/ 32位= 32位，则可能需要asm，如果没有内置的编译器。 (GNU C doesn't have one, AFAICT). （GNU C没有AFAICT）。 The obvious way in C (casting operands to uint64_t ) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. C语言中最明显的方法（将操作数广播到uint64_t ）生成对64bit / 64bit = 64bit libgcc函数的调用，该函数具有分支和多个div指令。 gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE . gcc不能很好地证明结果将适合32位，因此单个div指令不会导致#DE 。

For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount . 对于许多其他说明，您可以避免很多时候使用内置函数来编写内联汇编，例如popcount 。 With -mpopcnt , it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call. 使用-mpopcnt ，它将编译为popcnt指令（并说明Intel CPU具有的输出操作数的虚假依赖关系。）如果没有，它将编译为libgcc函数调用。

Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does . 始终喜欢内置的东西，或者更喜欢编译成好的asm的纯C语言，因此编译器知道代码的作用 。 When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified , but code using inline asm will just load constants into registers and do a div at run-time. 当内联使某些参数在编译时已知时，纯C可以被优化或简化，但是使用内联asm的代码只会将常量加载到寄存器中并在运行时进行div 。 Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize. 内联汇编也击败了相同数据上相似计算之间的CSE，当然不能自动矢量化。

Using GNU C syntax the right way 正确使用GNU C语法

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html解释了如何告诉汇编器您想要寄存器中的哪些变量以及输出是什么。

You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like , preferably by compiling with -masm=intel . 如果愿意，可以使用类似于Intel / MASM的语法和助记符，也可以使用非％寄存器名称，最好使用-masm=intel进行编译。 The AT&T syntax bug ( fsub and fsubr mnemonics are reversed ) might still be present in intel-syntax mode; AT＆T语法错误（ fsub和fsubr助记符相反）可能仍在intel语法模式下出现； I forget. 我忘了。

Most software projects that use GNU C inline asm use AT&T syntax only. 大多数使用GNU C内联汇编的软件项目仅使用AT＆T语法。

See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki. 有关更多GNU C内联asm信息和x86标签Wiki，另请参见此答案的底部。

An asm statement takes one string arg, and 3 sets of constraints. 一个asm语句采用一个字符串arg和3组约束。 The easiest way to make it multi-line is by making each asm line a separate string ending with \\n , and let the compiler implicitly concatenate them. 使它成为多行的最简单方法是使每条asm行成为以\\n结尾的单独字符串，并让编译器隐式连接它们。

Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. 另外，您还告诉编译器要将哪些内容放入寄存器中。然后，如果变量已经在寄存器中，则编译器不必溢出它们，也不必加载和存储它们。 Doing that would really shoot yourself in the foot. 这样做确实会使自己脚下射击。 The tutorial Brett Hale linked in comments hopefully covers all this. 布雷特·黑尔（Brett Hale）的教程在评论中链接起来，希望可以涵盖所有这一切。

Correct example of `div` with GNU C inline asm 带有GNU C内联汇编的`div`正确示例

You can see the compiler asm output for this on godbolt . 您可以在godbolt上看到编译器的asm输出。

uint32_t q, m;  // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.

asm ("divl %[bn2dat_j]\n"
      : "=a" (q), "=d" (m) // results are in eax, edx registers
      : "d" (0),           // zero edx for us, please
        "a" (bn1->dat[i]), // "a" means EAX / RAX
        [bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
      : // no register clobbers, and we don't read/write "memory" other than operands
    );

"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints. "divl %4"也可以使用，但是当您添加更多输入/输出约束时，命名的输入/输出不会更改名称。

如何从内联asm访问C结构/变量？

问题描述

1 个解决方案

解决方案1
6 已采纳 2015-09-23 18:49:59

Using GNU C syntax the right way 正确使用GNU C语法

Correct example of `div` with GNU C inline asm 带有GNU C内联汇编的`div`正确示例

如何从内联asm访问C结构/变量？

问题描述

1 个解决方案

解决方案1 6 已采纳 2015-09-23 18:49:59

Using GNU C syntax the right way 正确使用GNU C语法

Correct example of div with GNU C inline asm 带有GNU C内联汇编的div正确示例

解决方案1
6 已采纳 2015-09-23 18:49:59

Correct example of `div` with GNU C inline asm 带有GNU C内联汇编的`div`正确示例