尝试将 C function 转换为 x86_64 AT&T 程序集

Question

I've been trying to translate this function to assembly:我一直在尝试将这个 function 翻译成汇编：

void foo (int a[], int n) {
  int i;
  int s = 0;
  for (i=0; i<n; i++) {
    s += a[i];
    if (a[i] == 0) {
      a[i] = s;
      s = 0;
    }
  }
}

But something is going wrong.但是出了点问题。

That's what I've done so far:这就是我到目前为止所做的：

.section .text
.globl foo
foo:
.L1:
    pushq %rbp 
    movq %rsp, %rbp 
    subq $16, %rsp 

    movl $0, -16(%rbp) /*s*/ 
    movl $0, -8(%rbp) /*i*/

    jmp .L2

.L2:
    cmpl -8(%rbp), %esi 
    jle .L4 

    leave
    ret

.L3:
    addl $1, -8(%rbp) 
    jmp .L2

.L4:
    
    movl -8(%rbp), %eax 
    imull $4, %eax 
    movslq %eax, %rax 
    
    addq %rdi, %rax 
    
    movl (%rax), %eax 
    addl %eax, -16(%rbp) 

    cmpl $0, %eax
    jne .L3 

    /*      if      */
    leaq (%rax), %rdx 
    
    movl -16(%rbp), %eax 
    movl %eax, (%rdx) 
    movl $0, -16(%rbp) 
    jmp .L3

I am compiling the.s module with a.c module, for example, with an int nums [5] = {65, 23, 11, 0, 34} and I'm getting back the same array instead of {65, 23, 11, 99, 34} .我正在使用 a.c 模块编译 the.s 模块，例如，使用int nums [5] = {65, 23, 11, 0, 34}我得到相同的数组而不是{65, 23, 11, 99, 34} 。

Could someone help me?有人可以帮我吗？

Answer 1

Presumably you have a compiler that can generate AT&T syntax.假设您有一个可以生成 AT&T 语法的编译器。 It might be more instructive to look at what assembly output the compiler generates.查看编译器生成的程序集 output 可能更有指导意义。 Here's my re-formulation of your demo:这是我对您的演示的重新表述：

#include <stdio.h>

void foo (int a[], int n)
{
    for (int s = 0, i = 0; i < n; i++)
    {
        if (a[i] != 0)
            s += a[i];
        else
            a[i] = s, s = 0;
    }
}

int main (void)
{
    int nums[] = {65, 23, 11, 0, 34};
    int size = sizeof(nums) / sizeof(int);

    foo(nums, size);
    for (int i = 0; i < size; i++)
        fprintf(stdout, i < (size - 1) ? "%d, " : "%d\n", nums[i]);

    return (0);
}

Compiling without optimizations enabled is typically harder to work through than optimized code, since it loads from and spills results to memory. You won't learn much from it if you're investing time in learning how to write efficient assembly.在未启用优化的情况下进行编译通常比优化代码更难完成，因为它从 memory 加载结果并将结果溢出到 memory。如果您花时间学习如何编写高效的汇编，您将不会从中学到很多东西。

Compiling with the Godbolt compiler explorer with -O2 optimizations yields much more efficient code;使用带有-O2优化的Godbolt 编译器资源管理器编译会产生更高效的代码； it's also useful for cutting out unnecessary directives, labels, etc., that would be visual noise in this case.它对于删除不必要的指令、标签等也很有用，在这种情况下它们会成为视觉噪音。

In my experience, using -O2 optimizations are clever enough to make you rethink your use of registers, refactoring, etc. -O3 can sometimes optimize too agressively - unrolling loops, vectorizing, etc., to easily follow.根据我的经验，使用-O2优化足够聪明，可以让您重新考虑对寄存器、重构等的使用。- O3有时可能过于激进地进行优化 - 展开循环、矢量化等，很容易跟进。

Finally, for the case you have presented, there's a perfect compromise: -Os , which enables many of the optimizations of -O2 , but not at the expense of increased code size.最后，对于您提出的案例，有一个完美的折衷方案： -Os ，它可以实现-O2的许多优化，但不会以增加代码大小为代价。 I'll paste the assembly here just for comparative purposes:出于比较目的，我将在此处粘贴程序集：

foo:
        xorl    %eax, %eax
        xorl    %ecx, %ecx
.L2:
        cmpl    %eax, %esi
        jle     .L7
        movl    (%rdi,%rax,4), %edx
        testl   %edx, %edx
        je      .L3
        addl    %ecx, %edx
        jmp     .L4
.L3:
        movl    %ecx, (%rdi,%rax,4)
.L4:
        incq    %rax
        movl    %edx, %ecx
        jmp     .L2
.L7:
        ret

Remember that the calling convention passes the pointer to (a) in %rdi , and the 'count' (n) in %rsi .请记住，调用约定将指针传递给%rdi中的 ( (a) ) 和%rsi中的“计数” (n) 。 These are the calling conventions being used.这些是正在使用的调用约定。 Notice that your code does not 'dereference' or 'index' any elements through %rdi .请注意，您的代码不会通过%rdi “取消引用”或“索引”任何元素。 It's definitely worth going stepping through the code - even with pen and paper if it helps - to understand the branch conditions and how reading and writing is performed on element a[i] .单步执行代码（如果有帮助的话，即使使用笔和纸）绝对值得了解分支条件以及如何对元素a[i]执行读取和写入。

Curiously, using the inner loop of your code:奇怪的是，使用代码的内部循环：

s += a[i];
if (a[i] == 0)
    a[i] = s, s = 0;

Appears to generate more efficient code with -Os than the inner loop I used:与我使用的内部循环相比，使用-Os似乎可以生成更高效的代码：

foo:
        xorl    %eax, %eax
        xorl    %edx, %edx
.L2:
        cmpl    %eax, %esi
        jle     .L6
        movl    (%rdi,%rax,4), %ecx
        addl    %ecx, %edx
        testl   %ecx, %ecx
        jne     .L3
        movl    %edx, (%rdi,%rax,4)
        xorl    %edx, %edx
.L3:
        incq    %rax
        jmp     .L2
.L6:
        ret

A reminder for me to keep things simple!提醒我保持简单！

尝试将 C function 转换为 x86_64 AT&T 程序集

问题描述

1 个解决方案

解决方案1
1 2020-10-08 08:37:25

尝试将 C function 转换为 x86_64 AT&T 程序集

问题描述

1 个解决方案

解决方案1 1 2020-10-08 08:37:25

解决方案1
1 2020-10-08 08:37:25