How to access C struct/variables from inline asm?

Question

Consider the following code:

    int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
  {
    uint32 q, m;        /* Division Result */
    uint32 i;           /* Loop Counter */
    uint32 j;           /* Loop Counter */

    /* Check Input */
    if (bn1 == NULL) return(EFAULT);
    if (bn1->dat == NULL) return(EFAULT);
    if (bn2 == NULL) return(EFAULT);
    if (bn2->dat == NULL) return(EFAULT);
    if (bnr == NULL) return(EFAULT);
    if (bnr->dat == NULL) return(EFAULT);


    #if defined(__i386__) || defined(__amd64__)
    __asm__ (".intel_syntax noprefix");
    __asm__ ("pushl %eax");
    __asm__ ("pushl %edx");
    __asm__ ("pushf");
    __asm__ ("movl %eax, (bn1->dat[i])");
    __asm__ ("xorl %edx, %edx");
    __asm__ ("divl (bn2->dat[j])");
    __asm__ ("movl (q), %eax");
    __asm__ ("movl (m), %edx");
    __asm__ ("popf");
    __asm__ ("popl %edx");
    __asm__ ("popl %eax");
    #else
    q = bn->dat[i] / bn->dat[j];
    m = bn->dat[i] % bn->dat[j];
    #endif
    /* Return */
    return(0);
  }

The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:

typedef struct bn_data_t bn_t;
struct bn_data_t
  {
    uint32 sz1;         /* Bit Size */
    uint32 sz8;         /* Byte Size */
    uint32 szw;         /* Word Count */
    bnint *dat;         /* Data Array */
    uint32 flags;       /* Operational Flags */
  };

The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:

system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include  -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes  -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal  -Winline -Wunknown-pragmas -Wundef -Wendif-labels  -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
           ^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
              ^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
    uint32 i;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
    uint32 j;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
    __asm__ ("movl %eax, (bn1->dat[i])");
             ^
<inline asm>:1:18: note: instantiated into assembly here
        movl %eax, (bn1->dat[i])
                        ^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
    __asm__ ("divl (bn2->dat[j])");
             ^
<inline asm>:1:12: note: instantiated into assembly here
        divl (bn2->dat[j])
                  ^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1

Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->

What I know:

I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.

What I don't know:

How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).

How do I access local variables that are declared on the stack?

I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?

Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.

Additional:

Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.

EDIT 1:

The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:

__asm__ ("pushl %%eax",
    "pushl %%edx",
    "pushf",
    "movl (bn1->dat[i]), %%eax",
    "xorl %%edx, %%edx",
    "divl ($0x0c + bn2 + j)",
    "movl %%eax, (q)",
    "movl %%edx, (m)",
    "popf",
    "popl %%edx",
    "popl %%eax"
    );

It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.

Answer 1

If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div , which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer :

uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];

Compilers are quite good at CSE ing that into one div . Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.

eg *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j] , so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.

Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).

If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t ) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE .

For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount . With -mpopcnt , it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.

Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does . When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified , but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.

Using GNU C syntax the right way

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.

You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like , preferably by compiling with -masm=intel . The AT&T syntax bug ( fsub and fsubr mnemonics are reversed ) might still be present in intel-syntax mode; I forget.

Most software projects that use GNU C inline asm use AT&T syntax only.

See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.

An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \\n , and let the compiler implicitly concatenate them.

Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.

Correct example of `div` with GNU C inline asm

You can see the compiler asm output for this on godbolt .

uint32_t q, m;  // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.

asm ("divl %[bn2dat_j]\n"
      : "=a" (q), "=d" (m) // results are in eax, edx registers
      : "d" (0),           // zero edx for us, please
        "a" (bn1->dat[i]), // "a" means EAX / RAX
        [bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
      : // no register clobbers, and we don't read/write "memory" other than operands
    );

"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.

How to access C struct/variables from inline asm?

Question

1 answers

solution1
6 ACCPTED 2015-09-23 18:49:59

Using GNU C syntax the right way

Correct example of `div` with GNU C inline asm

How to access C struct/variables from inline asm?

Question

1 answers

solution1 6 ACCPTED 2015-09-23 18:49:59

Using GNU C syntax the right way

Correct example of div with GNU C inline asm

solution1
6 ACCPTED 2015-09-23 18:49:59

Correct example of `div` with GNU C inline asm