嘗試將 C function 轉換為 x86_64 AT&T 程序集

Question

我一直在嘗試將這個 function 翻譯成匯編：

void foo (int a[], int n) {
  int i;
  int s = 0;
  for (i=0; i<n; i++) {
    s += a[i];
    if (a[i] == 0) {
      a[i] = s;
      s = 0;
    }
  }
}

但是出了點問題。

這就是我到目前為止所做的：

.section .text
.globl foo
foo:
.L1:
    pushq %rbp 
    movq %rsp, %rbp 
    subq $16, %rsp 

    movl $0, -16(%rbp) /*s*/ 
    movl $0, -8(%rbp) /*i*/

    jmp .L2

.L2:
    cmpl -8(%rbp), %esi 
    jle .L4 

    leave
    ret

.L3:
    addl $1, -8(%rbp) 
    jmp .L2

.L4:
    
    movl -8(%rbp), %eax 
    imull $4, %eax 
    movslq %eax, %rax 
    
    addq %rdi, %rax 
    
    movl (%rax), %eax 
    addl %eax, -16(%rbp) 

    cmpl $0, %eax
    jne .L3 

    /*      if      */
    leaq (%rax), %rdx 
    
    movl -16(%rbp), %eax 
    movl %eax, (%rdx) 
    movl $0, -16(%rbp) 
    jmp .L3

我正在使用 a.c 模塊編譯 the.s 模塊，例如，使用int nums [5] = {65, 23, 11, 0, 34}我得到相同的數組而不是{65, 23, 11, 99, 34} 。

有人可以幫我嗎？

Answer 1

假設您有一個可以生成 AT&T 語法的編譯器。 查看編譯器生成的程序集 output 可能更有指導意義。 這是我對您的演示的重新表述：

#include <stdio.h>

void foo (int a[], int n)
{
    for (int s = 0, i = 0; i < n; i++)
    {
        if (a[i] != 0)
            s += a[i];
        else
            a[i] = s, s = 0;
    }
}

int main (void)
{
    int nums[] = {65, 23, 11, 0, 34};
    int size = sizeof(nums) / sizeof(int);

    foo(nums, size);
    for (int i = 0; i < size; i++)
        fprintf(stdout, i < (size - 1) ? "%d, " : "%d\n", nums[i]);

    return (0);
}

在未啟用優化的情況下進行編譯通常比優化代碼更難完成，因為它從 memory 加載結果並將結果溢出到 memory。如果您花時間學習如何編寫高效的匯編，您將不會從中學到很多東西。

使用帶有-O2優化的Godbolt 編譯器資源管理器編譯會產生更高效的代碼； 它對於刪除不必要的指令、標簽等也很有用，在這種情況下它們會成為視覺噪音。

根據我的經驗，使用-O2優化足夠聰明，可以讓您重新考慮對寄存器、重構等的使用。- O3有時可能過於激進地進行優化 - 展開循環、矢量化等，很容易跟進。

最后，對於您提出的案例，有一個完美的折衷方案： -Os ，它可以實現-O2的許多優化，但不會以增加代碼大小為代價。 出於比較目的，我將在此處粘貼程序集：

foo:
        xorl    %eax, %eax
        xorl    %ecx, %ecx
.L2:
        cmpl    %eax, %esi
        jle     .L7
        movl    (%rdi,%rax,4), %edx
        testl   %edx, %edx
        je      .L3
        addl    %ecx, %edx
        jmp     .L4
.L3:
        movl    %ecx, (%rdi,%rax,4)
.L4:
        incq    %rax
        movl    %edx, %ecx
        jmp     .L2
.L7:
        ret

請記住，調用約定將指針傳遞給%rdi中的 ( (a) ) 和%rsi中的“計數” (n) 。 這些是正在使用的調用約定。 請注意，您的代碼不會通過%rdi “取消引用”或“索引”任何元素。 單步執行代碼（如果有幫助的話，即使使用筆和紙）絕對值得了解分支條件以及如何對元素a[i]執行讀取和寫入。

奇怪的是，使用代碼的內部循環：

s += a[i];
if (a[i] == 0)
    a[i] = s, s = 0;

與我使用的內部循環相比，使用-Os似乎可以生成更高效的代碼：

foo:
        xorl    %eax, %eax
        xorl    %edx, %edx
.L2:
        cmpl    %eax, %esi
        jle     .L6
        movl    (%rdi,%rax,4), %ecx
        addl    %ecx, %edx
        testl   %ecx, %ecx
        jne     .L3
        movl    %edx, (%rdi,%rax,4)
        xorl    %edx, %edx
.L3:
        incq    %rax
        jmp     .L2
.L6:
        ret

提醒我保持簡單！

嘗試將 C function 轉換為 x86_64 AT&T 程序集

問題描述

1 個解決方案

解決方案1
1 2020-10-08 08:37:25

嘗試將 C function 轉換為 x86_64 AT&T 程序集

問題描述

1 個解決方案

解決方案1 1 2020-10-08 08:37:25

解決方案1
1 2020-10-08 08:37:25