简体繁体 English

从 C/C++ 程序读取管道的最快方法？

[英]Fastest way to read a pipe from C/C++ program?

原文 2014-02-21 18:58:35 2 2 c++/ linux/ shell/ unix/ pipe

If I want to pipe bytes of data in to a C/C++ program on Linux like this:如果我想通过管道将数据字节输入 Linux 上的 C/C++ 程序，如下所示：

cat my_file | cat my_file | ./my_app ./my_app

but:但：

We cannot assume the piped data is going to originate from a file我们不能假设管道数据将来自文件
We wish to interpret the data as bytes in the file (as opposed to strings)我们希望将数据解释为文件中的字节（而不是字符串）

what would be the fastest technique to read the pipe from the C/C++ application?从 C/C++ 应用程序读取管道的最快技术是什么？

I have done a little research and found:我做了一点研究，发现：

read()
std::cin.read()
popen()

but I am not sure if there is a better way, or which of the above would be better.但我不确定是否有更好的方法，或者以上哪种方法更好。

EDIT: There is a performance requirement on this, hence why I am asking for the technique with the smallest overhead.编辑：对此有性能要求，因此我要求使用开销最小的技术。

2 个解决方案

Why do you care that much about performance?你为什么那么关心性能？

1 gigabyte from /dev/urandom can be piped into wc in 1 minutes (and wc is running 15% of the time, waiting for data on the rest) !来自/dev/urandom 1 GB 可以在 1 分钟内通过管道传输到wc （并且wc运行 15% 的时间，等待其余数据）！ Just try time (head -1000000000c /dev/urandom|wc)试试time (head -1000000000c /dev/urandom|wc)

But the fastest way would be to use the read(2) syscall with a quite big buffer (eg 64Kbytes to 256Kbytes).但最快的方法是使用read(2)系统调用和一个相当大的缓冲区（例如 64Kbytes 到 256Kbytes）。

Of course, read Advanced Linux Programming and carefully syscalls(2) related man pages.当然，请阅读Advanced Linux Programming并仔细阅读syscalls(2)相关man页。

Study for inspiration the source code of the Linux kernel , of GNU libc , of musl-libc .研究Linux 内核、 GNU libc和musl-libc的源代码以获取灵感。 They all are open source projects, so feel free to contribute to them and to improve them.它们都是开源项目，因此请随时为它们做出贡献并改进它们。

But I bet that in practice using popen , or stdin , or reading from std::cin won't add much overhead.但我敢打赌，在实践中使用popen或stdin或从std::cin读取不会增加太多开销。

You could also increase the stdio buffer with setvbuf(3) .您还可以使用setvbuf(3)增加stdio缓冲区。

See also this question .另请参阅此问题。

(If you read from stdin the file descriptor is STDIN_FILENO which is 0) （如果你从stdin 中读取文件描述符是STDIN_FILENO ，它是 0）

You might be interested by time(7) , vdso(7) , syscalls(2)您可能对time(7) 、 vdso(7) 、 syscalls(2)感兴趣

You certainly should read documentation of GCC and this draft report.您当然应该阅读GCC 的文档和这份报告草案。

You could use machine learning techniques to optimize performance.您可以使用机器学习技术来优化性能。

Look into the MILEPOST GCC and Ctuning projects.查看MILEPOST GCC和Ctuning项目。 Consider joining the RefPerSys one.考虑加入RefPerSys之一。 Read of course Understanding machine learning: From theory to algorithms ISBN 978-1-107-05713-5阅读课程理解机器学习：从理论到算法ISBN 978-1-107-05713-5

When you pipe data in like that, the piped input is the standard input.当您像这样管道数据时，管道输入是标准输入。 Just read from cin (or stdin) like a normal console program.就像普通的控制台程序一样从cin（或stdin）读取。

Just use std::cin.read() .只需使用std::cin.read() 。 There's no reason to deal with popen() or its ilk.没有理由处理popen()或其同类。

Just to clarify... there is no pipe-specific way to read the input.只是为了澄清......没有特定于管道的方式来读取输入。 As far as your program is concerned, there's cin and that's it.就您的程序而言，有cin，仅此而已。

This question might help you out on the speed front though... Why is reading lines from stdin much slower in C++ than Python?这个问题可能会帮助你在速度方面有所帮助...... 为什么在 C++ 中从 stdin 读取行比 Python 慢得多？