Go 系统调用与 C 系统调用

Question

Go, and C both involve system calls directly (Technically, C will call a stub). Go 和 C 都直接涉及系统调用（从技术上讲，C 将调用存根）。

Technically, write is both a system call and a C function (at least on many systems).从技术上讲，write 既是系统调用又是 C 函数（至少在许多系统上是这样）。 However, the C function is just a stub which invokes the system call.然而，C 函数只是一个调用系统调用的存根。 Go does not call this stub, it invokes the system call directly, which means that C is not involved here Go 没有调用这个 stub，它直接调用系统调用，也就是说这里不涉及 C

From Differences between C write call and Go syscall.Write来自C write call 和 Go syscall.Write 之间的差异

My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).我的基准测试显示，在最新版本 (go1.11) 中，纯 C 系统调用比纯 Go 系统调用快 15.82%。

What did I miss?我错过了什么？ What could be a reason and how to optimize them?可能是什么原因以及如何优化它们？

Benchmarks:基准：

Go:走：

package main_test

import (
    "syscall"
    "testing"
)

func writeAll(fd int, buf []byte) error {
    for len(buf) > 0 {
        n, err := syscall.Write(fd, buf)
        if n < 0 {
            return err
        }
        buf = buf[n:]
    }
    return nil
}

func BenchmarkReadWriteGoCalls(b *testing.B) {
    fds, _ := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        writeAll(fds[0], []byte(message))
        syscall.Read(fds[1], buffer)
    }
}

C: C：

#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>

int write_all(int fd, void* buffer, size_t length) {
    while (length > 0) {
        int written = write(fd, buffer, length);
        if (written < 0)
            return -1;
        length -= written;
        buffer += written;
    }
    return length;
}

int read_call(int fd, void *buffer, size_t length) {
    return read(fd, buffer, length);
}

struct timespec timer_start(){
    struct timespec start_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
    return start_time;
}

long timer_end(struct timespec start_time){
    struct timespec end_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
    long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
    return diffInNanos;
}

int main() {
    int i = 0;
    int N = 500000;
    int fds[2];
    char message[14] = "hello, world!\0";
    char buffer[14] = {0};

    socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
    struct timespec vartime = timer_start();
    for(i = 0; i < N; i++) {
        write_all(fds[0], message, sizeof(message));
        read_call(fds[1], buffer, 14);
    }
    long time_elapsed_nanos = timer_end(vartime);
    printf("BenchmarkReadWritePureCCalls\t%d\t%.2ld ns/op\n", N, time_elapsed_nanos/N);
}

340 different running, each C running contains 500000 executions, and each Go running contains bN executions (mostly 500000, few times executed in 1000000 times): 340个不同的运行，每个C运行包含500000次执行，每个Go运行包含bN次执行（大部分是500000，少数在1000000次执行）：

T-Test for 2 Independent Means: The t-value is -22.45426. 2 个独立均值的 T 检验：t 值为 -22.45426。 The p-value is < .00001. p 值 < .00001。 The result is significant at p < .05.结果在 p < .05 处显着。

T-Test Calculator for 2 Dependent Means: The value of t is 15.902782. 2 个相关均值的 T 检验计算器：t 的值为 15.902782。 The value of p is < 0.00001. p 的值 < 0.00001。 The result is significant at p ≤ 0.05.结果在 p ≤ 0.05 时显着。

Update: I managed the proposal in the answers and wrote another benchmark, it shows the proposed approach significantly drops the performance of massive I/O calls, its performance close to CGO calls.更新：我在答案中管理了该提议并编写了另一个基准测试，它表明所提议的方法显着降低了大量 I/O 调用的性能，其性能接近 CGO 调用。

Benchmark:基准：

func BenchmarkReadWriteNetCalls(b *testing.B) {
    cs, _ := socketpair()
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        cs[0].Write([]byte(message))
        cs[1].Read(buffer)
    }
}

func socketpair() (conns [2]net.Conn, err error) {
    fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
    if err != nil {
        return
    }
    conns[0], err = fdToFileConn(fds[0])
    if err != nil {
        return
    }
    conns[1], err = fdToFileConn(fds[1])
    if err != nil {
        conns[0].Close()
        return
    }
    return
}

func fdToFileConn(fd int) (net.Conn, error) {
    f := os.NewFile(uintptr(fd), "")
    defer f.Close()
    return net.FileConn(f)
}

The above figure shows, 100 different running, each C running contains 500000 executions, and each Go running contains bN executions (mostly 500000, few times executed in 1000000 times)上图显示，100次不同的运行，每次C运行包含500000次执行，每次Go运行包含bN次执行（多为500000次，少数执行1000000次）

Answer 1

My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).我的基准测试显示，在最新版本 (go1.11) 中，纯 C 系统调用比纯 Go 系统调用快 15.82%。

What did I miss?我错过了什么？ What could be a reason and how to optimize them?可能是什么原因以及如何优化它们？

The reason is that while both C and Go (on a typical platform Go supports—such as Linux or *BSD or Windows) are compiled down to machine code, Go-native code runs in an environment quite different from that of C.原因是，虽然 C 和 Go（在 Go 支持的典型平台上——例如 Linux 或 *BSD 或 Windows）都被编译为机器代码，但 Go-native 代码在与 C 完全不同的环境中运行。

The two chief differences to C are:与 C 的两个主要区别是：

Go code runs in the context of so-called goroutines which are freely scheduled by the Go runtime on different OS threads. Go 代码在所谓的 goroutine 的上下文中运行，Go 运行时在不同的 OS 线程上自由调度这些 goroutine。
Goroutines use their own (growable and reallocatable) lightweight stacks which have nothing to do with the OS-supplied stack C code uses. Goroutines 使用它们自己的（可增长和可重新分配的）轻量级堆栈，这些堆栈与操作系统提供的堆栈 C 代码使用无关。

So, when Go code wants to make a syscall, quite a lot should happen:因此，当 Go 代码想要进行系统调用时，应该会发生很多事情：

The goroutine which is about to enter a syscall must be "pinned" to the OS thread on which it's currently running.即将进入系统调用的 goroutine 必须“固定”到它当前运行的操作系统线程。
The execution must be switched to use the OS-supplied C stack.必须切换执行以使用操作系统提供的 C 堆栈。
The necessary preparation in the Go runtime's scheduler are made.在 Go 运行时的调度程序中进行了必要的准备。
The goroutine enters the syscall. goroutine 进入系统调用。
Upon exiting the execution of the goroutine has to be resumed, which is a relatively involved process in itself which may be additionaly hampered if the goroutine was in the syscall for too long and the scheduler removed the so-called "processor" from under that goroutine, spawned another OS thread and made that processor run another goroutine ("processors", or P s are thingies which run goroutines on OS threads).退出时必须恢复 goroutine 的执行，这本身是一个相对复杂的过程，如果 goroutine在系统调用中的时间过长并且调度程序从该 goroutine 下删除了所谓的“处理器”，则可能会受到额外的阻碍，产生另一个 OS 线程并使该处理器运行另一个 goroutine（“处理器”，或P是在 OS 线程上运行 goroutine 的东西）。

Update to answer the OP's comment更新以回答 OP 的评论

<…> Thus there is no way to optimize and I must suffer that if I make massive IO calls, mustn't I? <...> 因此没有办法优化，如果我进行大量 IO 调用，我必须忍受，不是吗？

It heavily depends on the nature of the "massive I/O" you're after.这在很大程度上取决于您所追求的“大规模 I/O”的性质。

If your example (with socketpair(2) ) is not toy, there is simply no reason to use syscalls directly: the FDs returned by socketpair(2) are "pollable" and hence the Go runtime may use its native "netpoller" machinery to perform I/O on them.如果您的示例（使用socketpair(2) ）不是玩具，则根本没有理由直接使用系统调用： socketpair(2)返回的socketpair(2)是“可轮询的”，因此 Go 运行时可能会使用其原生的“netpoller”机制来对它们执行 I/O。 Here is a working code from one of my projects which properly "wraps" FDs produced by socketpair(2) so that they can be used as "regular" sockets (produced by functions from the net standard package):这是我的一个项目中的一个工作代码，它正确地“包装”了socketpair(2)生成的socketpair(2)以便它们可以用作“常规”套接字（由net标准包中的函数生成）：

func socketpair() (net.Conn, net.Conn, error) {
       fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
       if err != nil {
               return nil, nil, err
       }

       c1, err := fdToFileConn(fds[0])
       if err != nil {
               return nil, nil, err
       }

       c2, err := fdToFileConn(fds[1])
       if err != nil {
               c1.Close()
               return nil, nil, err
       }

       return c1, c2, nil
}

func fdToFileConn(fd int) (net.Conn, error) {
       f := os.NewFile(uintptr(fd), "")
       defer f.Close()
       return net.FileConn(f)
}

If you're talking about some other sort of I/O, the answer is that yes, syscalls are not really cheap and if you must do lots of them, there are ways to work around their cost (such as offloading to some C code—linked in or hooked up as an external process—which would somehow batch them so that each call to that C code would result in several syscalls done by the C side).如果您在谈论某种其他类型的 I/O，答案是肯定的，系统调用并不便宜，如果您必须执行大量操作，则有一些方法可以解决它们的成本（例如卸载到某些 C 代码） ——作为外部进程链接或连接——这将以某种方式对它们进行批处理，以便对该 C 代码的每次调用都会导致 C 端完成多个系统调用）。

Go 系统调用与 C 系统调用

问题描述

1 个解决方案

解决方案1
18 已采纳 2018-09-12 15:14:25

Go 系统调用与 C 系统调用

问题描述

1 个解决方案

解决方案1 18 已采纳 2018-09-12 15:14:25

解决方案1
18 已采纳 2018-09-12 15:14:25