简体   繁体   English

为什么Go中的简单循环比C中的循环快?

[英]Why is this simple loop faster in Go than in C?

I was trying to find out whether Go's loop performance is as good as C's, but surprisingly found that for my simple test, C version takes twice the time of Go version. 我试图找出Go的循环性能是否与C一样好,但是令人惊讶地发现,对于我的简单测试,C版本花费了Go版本两倍的时间。

C Version: C版本:

#include <stdio.h>

int main() {
  int i = 0, a = 0;

  while (i < 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf("%d\n", a);
}

,

$ gcc -o main main.c && time ./main # tried -O0 as well; the result is similar
36
./main  10.53s user 0.08s system 98% cpu 10.769 total

Go Version: 转到版本:

package main

import "fmt"

func main() {
    a := int32(0)
    for i := int32(0); i < 1e9; i++ {
        a = (a + i) % 42
    }
    fmt.Println(a)
}

,

$ time go run main.go
36
colorgo run main.go  5.27s user 0.14s system 93% cpu 5.816 total

(tested on Darwin, amd64) (在达尔文测试,amd64)

For this simple algorithm, shouldn't both of them produce nearly identical machine code? 对于这个简单的算法,它们是否都不能产生几乎相同的机器代码? Is this due to compiler optimization? 这是由于编译器优化吗? Cache efficiency? 缓存效率?

Please help me understand! 请帮我理解! Thanks! 谢谢!

It all boils down to the assembly generated. 全部归结为生成的程序集。

go tool 6g -S (21 instructions): 执行工具6g -S(21条指令):

MOVL    $0,SI
MOVL    SI,"".a+8(FP)
MOVL    $0,CX
CMPL    CX,$1000000000
JGE     $0,58
ADDL    CX,SI
MOVL    $818089009,BP
MOVL    SI,AX
IMULL   BP,
MOVL    DX,BX
SARL    $3,BX
MOVL    SI,BP
SARL    $31,BP
SUBL    BP,BX
IMULL   $42,BX
SUBL    BX,SI
MOVL    SI,"".a+8(FP)
INCL    ,CX #point A
NOP     ,
CMPL    CX,$1000000000
JLT     $0,16
RET     ,

gcc -O3 -march=native -S (17 instructions): gcc -O3 -march = native -S(17条指令):

leal    (%rsi,%rcx), %edi
addl    $1, %ecx
vxorpd  %xmm0, %xmm0, %xmm0
vcvtsi2sd       %ecx, %xmm0, %xmm0
movl    %edi, %eax
imull   %r8d
movl    %edi, %eax
sarl    $31, %eax
sarl    $3, %edx
movl    %edx, %esi
subl    %eax, %esi
imull   $42, %esi, %esi
subl    %esi, %edi
vucomisd        %xmm0, %xmm1
movl    %edi, %esi
ja      .L2
subq    $8, %rsp

gcc -O3 -march=native -S (14 instructions, after replacing 1e9 with 1000000000): gcc -O3 -march = native -S(14条指令,用1000000000替换1e9之后):

leal    (%rdx,%rcx), %esi
addl    $1, %ecx
movl    %esi, %eax
imull   %edi
movl    %esi, %eax
sarl    $31, %eax
sarl    $3, %edx
subl    %eax, %edx
imull   $42, %edx, %edx
subl    %edx, %esi
movl    %esi, %edx
cmpl    $1000000000, %ecx
jne     .L2
subq    $8, %rsp

Timing: 定时:

$ gcc -O3 -march=native loop.c; and time ./a.out
36
2.92user 0.00system 0:02.93elapsed 99%CPU
$ go build -o loop loop.go; and time ./loop
36
2.89user 0.00system 0:02.90elapsed 99%CPU
$ gcc -O3 -march=native loop_nofp.c; and time ./a.out
36
2.92user 0.00system 0:02.94elapsed 99%CPU (0avgtext+0avgdata 1312maxresident)

I have no idea, I'm leaving this for now until a proper answer is posted. 我不知道,我暂时不说,直到发布正确答案为止。

//edit //编辑

Changing the C code to use for to match the Go version produced different assembly but the exact same timing. 更改用于匹配Go版本的C代码会产生不同的程序集,但计时完全相同。

int main() {
    int32_t i = 0, a = 0;
    for (i = 0; i < 1e9; i++) {
        a = (a + i) % 42;
    }
    printf("%d\n", a);
    return 0;
}

They are about the same time when optimizing. 它们在优化时大约是同一时间。 For example, 例如,

Go: 走:

$ cat t.go
package main

import "fmt"

func main() {
    a := int32(0)
    for i := int32(0); i < 1e9; i++ {
        a = (a + i) % 42
    }
    fmt.Println(a)
}
$ go version
go version devel +e1a081e6ddf8 Sat Sep 27 11:56:54 2014 -0700 linux/amd64
$ go build t.go && time ./t
36
real    0m15.809s
user    0m15.815s
sys 0m0.061s

C: C:

$ cat t.c
#include <stdio.h>

int main() {
  int i = 0, a = 0;

  while (i < 1e9) {
    a = (a + i) % 42;
    i = i + 1;
  }
  printf("%d\n", a);
}
$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
$ gcc -O3 t.c && time ./a.out
36
real    0m16.538s
user    0m16.528s
sys 0m0.021s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM