Trying to translate a C function to x86_64 AT&T assembly

Question

I've been trying to translate this function to assembly:

void foo (int a[], int n) {
  int i;
  int s = 0;
  for (i=0; i<n; i++) {
    s += a[i];
    if (a[i] == 0) {
      a[i] = s;
      s = 0;
    }
  }
}

But something is going wrong.

That's what I've done so far:

.section .text
.globl foo
foo:
.L1:
    pushq %rbp 
    movq %rsp, %rbp 
    subq $16, %rsp 

    movl $0, -16(%rbp) /*s*/ 
    movl $0, -8(%rbp) /*i*/

    jmp .L2

.L2:
    cmpl -8(%rbp), %esi 
    jle .L4 

    leave
    ret

.L3:
    addl $1, -8(%rbp) 
    jmp .L2

.L4:
    
    movl -8(%rbp), %eax 
    imull $4, %eax 
    movslq %eax, %rax 
    
    addq %rdi, %rax 
    
    movl (%rax), %eax 
    addl %eax, -16(%rbp) 

    cmpl $0, %eax
    jne .L3 

    /*      if      */
    leaq (%rax), %rdx 
    
    movl -16(%rbp), %eax 
    movl %eax, (%rdx) 
    movl $0, -16(%rbp) 
    jmp .L3

I am compiling the.s module with a.c module, for example, with an int nums [5] = {65, 23, 11, 0, 34} and I'm getting back the same array instead of {65, 23, 11, 99, 34} .

Could someone help me?

Answer 1

Presumably you have a compiler that can generate AT&T syntax. It might be more instructive to look at what assembly output the compiler generates. Here's my re-formulation of your demo:

#include <stdio.h>

void foo (int a[], int n)
{
    for (int s = 0, i = 0; i < n; i++)
    {
        if (a[i] != 0)
            s += a[i];
        else
            a[i] = s, s = 0;
    }
}

int main (void)
{
    int nums[] = {65, 23, 11, 0, 34};
    int size = sizeof(nums) / sizeof(int);

    foo(nums, size);
    for (int i = 0; i < size; i++)
        fprintf(stdout, i < (size - 1) ? "%d, " : "%d\n", nums[i]);

    return (0);
}

Compiling without optimizations enabled is typically harder to work through than optimized code, since it loads from and spills results to memory. You won't learn much from it if you're investing time in learning how to write efficient assembly.

Compiling with the Godbolt compiler explorer with -O2 optimizations yields much more efficient code; it's also useful for cutting out unnecessary directives, labels, etc., that would be visual noise in this case.

In my experience, using -O2 optimizations are clever enough to make you rethink your use of registers, refactoring, etc. -O3 can sometimes optimize too agressively - unrolling loops, vectorizing, etc., to easily follow.

Finally, for the case you have presented, there's a perfect compromise: -Os , which enables many of the optimizations of -O2 , but not at the expense of increased code size. I'll paste the assembly here just for comparative purposes:

foo:
        xorl    %eax, %eax
        xorl    %ecx, %ecx
.L2:
        cmpl    %eax, %esi
        jle     .L7
        movl    (%rdi,%rax,4), %edx
        testl   %edx, %edx
        je      .L3
        addl    %ecx, %edx
        jmp     .L4
.L3:
        movl    %ecx, (%rdi,%rax,4)
.L4:
        incq    %rax
        movl    %edx, %ecx
        jmp     .L2
.L7:
        ret

Remember that the calling convention passes the pointer to (a) in %rdi , and the 'count' (n) in %rsi . These are the calling conventions being used. Notice that your code does not 'dereference' or 'index' any elements through %rdi . It's definitely worth going stepping through the code - even with pen and paper if it helps - to understand the branch conditions and how reading and writing is performed on element a[i] .

Curiously, using the inner loop of your code:

s += a[i];
if (a[i] == 0)
    a[i] = s, s = 0;

Appears to generate more efficient code with -Os than the inner loop I used:

foo:
        xorl    %eax, %eax
        xorl    %edx, %edx
.L2:
        cmpl    %eax, %esi
        jle     .L6
        movl    (%rdi,%rax,4), %ecx
        addl    %ecx, %edx
        testl   %ecx, %ecx
        jne     .L3
        movl    %edx, (%rdi,%rax,4)
        xorl    %edx, %edx
.L3:
        incq    %rax
        jmp     .L2
.L6:
        ret

A reminder for me to keep things simple!

Trying to translate a C function to x86_64 AT&T assembly

Question

1 answers

solution1
1 2020-10-08 08:37:25

Trying to translate a C function to x86_64 AT&T assembly

Question

1 answers

solution1 1 2020-10-08 08:37:25

solution1
1 2020-10-08 08:37:25