Go和C ++中的矢量性能

Question

Please consider these two snippets in GO and C++11. 请考虑GO和C ++ 11中的这两个片段。 In C++ std::vector is a doubling-array which has amortized O(1) insert operation. 在C ++中， std::vector是一个双重数组，它已经分摊了O（1）插入操作。 How to achieve the same performance in GO? 如何在GO中实现相同的性能？ Problem is that this GO code is about 3 times slower on my hardware. 问题是这个GO代码在我的硬件上慢了大约3倍。 Run many times. 跑多次。

Compiled: 编译：

go build vec.go (go version go1.2.1 linux/amd64) go build vec.go （去版本go1.2.1 linux / amd64）
g++ -O2 -std=gnu++11 -o vec vec.cc (g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2) g++ -O2 -std=gnu++11 -o vec vec.cc （g ++（Ubuntu 4.8.2-19ubuntu1）4.8.2）

GO version (vec.go): GO版本（vec.go）：

package main

type X struct {
    x int32
    y float64
}

const N int = 80000000

func main() {
    x := X{123, 2.64}
    s := make([]X, 1)
    for i := 0; i < N; i++ {
        s = append(s, x)
    }
}

C++11 version (vec.cc): C ++ 11版本（vec.cc）：

#include <vector>

const int N = 80000000;

struct X {
        int x;
        double y;
};

int main(void)
{
        X x{123, 2.64};
        std::vector<X> s(1);
        for (int i = 0; i < N; ++i) {
                s.push_back(x);
        }
}

Answer 1

Go's specification doesn't require any particular complexity for append() , but in practice it's also implemented in ammortized constant time, as described in the answer to this question . Go的规范并不要求append()具有任何特定的复杂性，但在实践中它也是在固定的常量时间内实现的，如本问题的答案中所述。

The current implementation works as follows: for array sizes below 1024, it doubles as needed, and above 1024 it increases to 1.25x the original size. 当前的实现如下工作：对于低于1024的数组大小，它根据需要加倍，而在1024以上它增加到原始大小的1.25倍。 Increasing by 1.25x is still amortized constant time, but it has the effect of imposing a higher amortized constant factor than an implementation that always doubles. 增加1.25倍仍然是摊销的固定时间，但它具有比总是加倍的实施施加更高的摊销常数因子的效果。 However 1.25x wastes less memory overall. 然而，整体内存浪费了1.25倍。

If you're getting different performance behavior by only a few times (even at very large N), then you're seeing different constant factors in play. 如果你的性能行为只有几次（即使是非常大的N），那么你会看到不同的常数因素在起作用。 I've noted myself that the machine code produced by the gc compiler is much more efficient than that generated by gccgo . 我已经注意到gc编译器生成的机器代码比gccgo生成的机器代码更有效。

To verify for yourself that Go is operating in ammortized constant time, try plotting the time it takes to run your algorithm for several different values of N. 为了验证Go是否以固定的常数时间运行，请尝试绘制为几个不同的N值运行算法所需的时间。

Answer 2

I've already answered your computational complexity question: append complexity . 我已经回答了你的计算复杂性问题：追加复杂性。 It is amortized constant time. 它是摊销的恒定时间。

My results from your benchmark. 我的基准测试结果。

$ rm vec
$ cat vec.cc
#include <vector>

const int N = 80000000;

struct X {
        int x;
        double y;
};

int main(void)
{
        X x{123, 2.64};
        std::vector<X> s(1);
        for (int i = 0; i < N; ++i) {
                s.push_back(x);
        }
}
$ g++ -O2 -std=gnu++11 -o vec vec.cc
$ time ./vec
real    0m1.360s
user    0m0.536s
sys 0m0.816s
$ rm vec
$ cat vec.go
package main

type X struct {
    x int32
    y float64
}

const N int = 80000000

func main() {
    x := X{123, 2.64}
    s := make([]X, 1)
    for i := 0; i < N; i++ {
        s = append(s, x)
    }
}
$ go version
go version devel +6b696a34e0af Sun Aug 03 15:14:59 2014 -0700 linux/amd64
$ go build vec.go
$ time ./vec
real    0m2.590s
user    0m1.192s
sys 0m1.388s
$

Answer 3

If you know the number of elements before hand, you can preallocate it with: 如果你事先知道元素的数量，你可以预先分配它：

s := make([]X, 0, N)
for i := 0; i < N; i++ {
    s = append(s, x)
}

Also use Go 1.3, the compiler got some optimizations. 同样使用Go 1.3，编译器得到了一些优化。

And for better vectorization, try gccgo 为了更好的矢量化，请尝试gccgo

Go和C ++中的矢量性能

问题描述

3 个解决方案

解决方案1
11 已采纳 2014-08-03 22:54:20

解决方案2
1 2014-08-03 23:01:27

解决方案3
0 2014-08-03 20:09:59

Go和C ++中的矢量性能

问题描述

3 个解决方案

解决方案1 11 已采纳 2014-08-03 22:54:20

解决方案2 1 2014-08-03 23:01:27

解决方案3 0 2014-08-03 20:09:59

解决方案1
11 已采纳 2014-08-03 22:54:20

解决方案2
1 2014-08-03 23:01:27

解决方案3
0 2014-08-03 20:09:59