[英]Go syscall v.s. C system call
Go, and C both involve system calls directly (Technically, C will call a stub). Go 和 C 都直接涉及系统调用(从技术上讲,C 将调用存根)。
Technically, write is both a system call and a C function (at least on many systems).
从技术上讲,write 既是系统调用又是 C 函数(至少在许多系统上是这样)。 However, the C function is just a stub which invokes the system call.
然而,C 函数只是一个调用系统调用的存根。 Go does not call this stub, it invokes the system call directly, which means that C is not involved here
Go 没有调用这个 stub,它直接调用系统调用,也就是说这里不涉及 C
From Differences between C write call and Go syscall.Write
来自C write call 和 Go syscall.Write 之间的差异
My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).我的基准测试显示,在最新版本 (go1.11) 中,纯 C 系统调用比纯 Go 系统调用快 15.82%。
What did I miss?我错过了什么? What could be a reason and how to optimize them?
可能是什么原因以及如何优化它们?
Benchmarks:基准:
Go:走:
package main_test
import (
"syscall"
"testing"
)
func writeAll(fd int, buf []byte) error {
for len(buf) > 0 {
n, err := syscall.Write(fd, buf)
if n < 0 {
return err
}
buf = buf[n:]
}
return nil
}
func BenchmarkReadWriteGoCalls(b *testing.B) {
fds, _ := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
message := "hello, world!"
buffer := make([]byte, 13)
for i := 0; i < b.N; i++ {
writeAll(fds[0], []byte(message))
syscall.Read(fds[1], buffer)
}
}
C: C:
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>
int write_all(int fd, void* buffer, size_t length) {
while (length > 0) {
int written = write(fd, buffer, length);
if (written < 0)
return -1;
length -= written;
buffer += written;
}
return length;
}
int read_call(int fd, void *buffer, size_t length) {
return read(fd, buffer, length);
}
struct timespec timer_start(){
struct timespec start_time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
return start_time;
}
long timer_end(struct timespec start_time){
struct timespec end_time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
return diffInNanos;
}
int main() {
int i = 0;
int N = 500000;
int fds[2];
char message[14] = "hello, world!\0";
char buffer[14] = {0};
socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
struct timespec vartime = timer_start();
for(i = 0; i < N; i++) {
write_all(fds[0], message, sizeof(message));
read_call(fds[1], buffer, 14);
}
long time_elapsed_nanos = timer_end(vartime);
printf("BenchmarkReadWritePureCCalls\t%d\t%.2ld ns/op\n", N, time_elapsed_nanos/N);
}
340 different running, each C running contains 500000 executions, and each Go running contains bN executions (mostly 500000, few times executed in 1000000 times): 340个不同的运行,每个C运行包含500000次执行,每个Go运行包含bN次执行(大部分是500000,少数在1000000次执行):
T-Test for 2 Independent Means: The t-value is -22.45426. 2 个独立均值的 T 检验:t 值为 -22.45426。 The p-value is < .00001.
p 值 < .00001。 The result is significant at p < .05.
结果在 p < .05 处显着。
T-Test Calculator for 2 Dependent Means: The value of t is 15.902782. 2 个相关均值的 T 检验计算器:t 的值为 15.902782。 The value of p is < 0.00001.
p 的值 < 0.00001。 The result is significant at p ≤ 0.05.
结果在 p ≤ 0.05 时显着。
Update: I managed the proposal in the answers and wrote another benchmark, it shows the proposed approach significantly drops the performance of massive I/O calls, its performance close to CGO calls.更新:我在答案中管理了该提议并编写了另一个基准测试,它表明所提议的方法显着降低了大量 I/O 调用的性能,其性能接近 CGO 调用。
Benchmark:基准:
func BenchmarkReadWriteNetCalls(b *testing.B) {
cs, _ := socketpair()
message := "hello, world!"
buffer := make([]byte, 13)
for i := 0; i < b.N; i++ {
cs[0].Write([]byte(message))
cs[1].Read(buffer)
}
}
func socketpair() (conns [2]net.Conn, err error) {
fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
if err != nil {
return
}
conns[0], err = fdToFileConn(fds[0])
if err != nil {
return
}
conns[1], err = fdToFileConn(fds[1])
if err != nil {
conns[0].Close()
return
}
return
}
func fdToFileConn(fd int) (net.Conn, error) {
f := os.NewFile(uintptr(fd), "")
defer f.Close()
return net.FileConn(f)
}
The above figure shows, 100 different running, each C running contains 500000 executions, and each Go running contains bN executions (mostly 500000, few times executed in 1000000 times)上图显示,100次不同的运行,每次C运行包含500000次执行,每次Go运行包含bN次执行(多为500000次,少数执行1000000次)
My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).
我的基准测试显示,在最新版本 (go1.11) 中,纯 C 系统调用比纯 Go 系统调用快 15.82%。
What did I miss?
我错过了什么? What could be a reason and how to optimize them?
可能是什么原因以及如何优化它们?
The reason is that while both C and Go (on a typical platform Go supports—such as Linux or *BSD or Windows) are compiled down to machine code, Go-native code runs in an environment quite different from that of C.原因是,虽然 C 和 Go(在 Go 支持的典型平台上——例如 Linux 或 *BSD 或 Windows)都被编译为机器代码,但 Go-native 代码在与 C 完全不同的环境中运行。
The two chief differences to C are:与 C 的两个主要区别是:
So, when Go code wants to make a syscall, quite a lot should happen:因此,当 Go 代码想要进行系统调用时,应该会发生很多事情:
P
s are thingies which run goroutines on OS threads).P
是在 OS 线程上运行 goroutine 的东西)。 Update to answer the OP's comment更新以回答 OP 的评论
<…> Thus there is no way to optimize and I must suffer that if I make massive IO calls, mustn't I?
<...> 因此没有办法优化,如果我进行大量 IO 调用,我必须忍受,不是吗?
It heavily depends on the nature of the "massive I/O" you're after.这在很大程度上取决于您所追求的“大规模 I/O”的性质。
If your example (with socketpair(2)
) is not toy, there is simply no reason to use syscalls directly: the FDs returned by socketpair(2)
are "pollable" and hence the Go runtime may use its native "netpoller" machinery to perform I/O on them.如果您的示例(使用
socketpair(2)
)不是玩具,则根本没有理由直接使用系统调用: socketpair(2)
返回的socketpair(2)
是“可轮询的”,因此 Go 运行时可能会使用其原生的“netpoller”机制来对它们执行 I/O。 Here is a working code from one of my projects which properly "wraps" FDs produced by socketpair(2)
so that they can be used as "regular" sockets (produced by functions from the net
standard package):这是我的一个项目中的一个工作代码,它正确地“包装”了
socketpair(2)
生成的socketpair(2)
以便它们可以用作“常规”套接字(由net
标准包中的函数生成):
func socketpair() (net.Conn, net.Conn, error) {
fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
if err != nil {
return nil, nil, err
}
c1, err := fdToFileConn(fds[0])
if err != nil {
return nil, nil, err
}
c2, err := fdToFileConn(fds[1])
if err != nil {
c1.Close()
return nil, nil, err
}
return c1, c2, nil
}
func fdToFileConn(fd int) (net.Conn, error) {
f := os.NewFile(uintptr(fd), "")
defer f.Close()
return net.FileConn(f)
}
If you're talking about some other sort of I/O, the answer is that yes, syscalls are not really cheap and if you must do lots of them, there are ways to work around their cost (such as offloading to some C code—linked in or hooked up as an external process—which would somehow batch them so that each call to that C code would result in several syscalls done by the C side).如果您在谈论某种其他类型的 I/O,答案是肯定的,系统调用并不便宜,如果您必须执行大量操作,则有一些方法可以解决它们的成本(例如卸载到某些 C 代码) ——作为外部进程链接或连接——这将以某种方式对它们进行批处理,以便对该 C 代码的每次调用都会导致 C 端完成多个系统调用)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.