C/C++ 将多个字节发送到标准输出的最佳方式

Question

Profiling my program and the function print is taking a lot of time to perform.分析我的程序和函数 print 需要很多时间来执行。 How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?如何将“原始”字节输出直接发送到标准输出而不是使用 fwrite，并使其更快（需要将打印（）中的所有 9 字节同时发送到标准输出）？

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

} }

Matrix is defined globally to be a unsigned char matrix[3][3]; Matrix 全局定义为无符号字符矩阵[3][3]；

Answer 1

IO is not an inexpensive operation. IO 不是一种廉价的操作。 It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.实际上，这是一个阻塞操作，这意味着当您调用write时操作系统可以抢占您的进程以允许更多 CPU 绑定进程在您写入的 IO 设备完成操作之前运行。

The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now.您可以使用的唯一较低级别的功能（如果您在 *nix 机器上开发）是使用原始write功能，但即便如此，您的性能也不会比现在快得多。 Simply put: IO is expensive.简单地说：IO 是昂贵的。

Answer 2

The top rated answer claims that IO is slow.评分最高的答案声称 IO 很慢。

Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps.这里有一个快速的基准有足够大的缓冲采取OS出关键性能路径，但前提是你愿意接受巨blurps你的输出。 If latency to first byte is your problem, you need to run in "dribs" mode.如果第一个字节的延迟是您的问题，您需要以“点滴”模式运行。

Write 10 million records from a nine byte array从九字节数组中写入 1000 万条记录

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1 Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0 FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

There's nothing slow about IO if you can afford to buffer properly.如果您能负担得起适当的缓冲，那么 IO 就不会慢。

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}

Answer 3

The rawest form of output you can do is the probable the write system call, like this你可以做的最原始的输出形式是可能的write系统调用，像这样

write (1, matrix, 9);

1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). 1 是标准输出的文件描述符（0 是标准输入，2 是标准错误）。 Your standard out will only write as fast as the one reading it at the other end (ie the terminal, or the program you're pipeing into) which might be rather slow.您的标准输出只会与在另一端（即终端或您正在输入的程序）读取它的人一样快，这可能会相当慢。

I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl ) and hope the OS will buffer it for you until it can be consumed by the other end.我不是 100% 确定，但您可以尝试在 fd 1 上设置非阻塞 IO（使用fcntl ）并希望操作系统为您缓冲它，直到它可以被另一端消耗。 It's been a while, but I think it works like this已经有一段时间了，但我认为它是这样工作的

fcntl (1, F_SETFL, O_NONBLOCK);

YMMV though.虽然是 YMMV。 Please correct me if I'm wrong on the syntax, as I said, it's been a while.如果我在语法上有错误，请纠正我，正如我所说，已经有一段时间了。

Answer 4

Perhaps your problem is not that fwrite() is slow, but that it is buffered.也许你的问题不是 fwrite() 很慢，而是它被缓冲了。 Try calling fflush(stdout) after the fwrite().尝试在 fwrite() 之后调用 fflush(stdout)。

This all really depends on your definition of slow in this context.这一切都取决于您在这种情况下对慢的定义。

Answer 5

All printing is fairly slow, although iostreams are really slow for printing.尽管 iostream 的打印速度确实很慢，但所有打印都相当慢。

Your best bet would be to use printf, something along the lines of:您最好的选择是使用 printf，类似以下内容：

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

Answer 6

As everyone has pointed out IO in tight inner loop is expensive.正如每个人都指出的，紧密内循环中的 IO 是昂贵的。 I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.我通常会在需要调试时根据某些标准对 Matrix 进行条件 cout。

If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes.如果您的应用程序是控制台应用程序，则尝试将其重定向到文件，这将比执行控制台刷新要快得多。 eg app.exe > matrixDump.txt例如 app.exe > matrixDump.txt

Answer 7

What's wrong with:有什么问题：

fwrite(matrix,1,9,stdout);

both the one and the two dimensional arrays take up the same memory.一维数组和二维数组占用相同的内存。

Answer 8

You can simply:您可以简单地：

std::cout << temp;

printf is more C-Style. printf更像 C 风格。

Yet, IO operations are costly, so use them wisely.然而，IO 操作成本高昂，因此请明智地使用它们。

Answer 9

Try running the program twice.尝试运行该程序两次。 Once with output and once without.一次有输出，一次没有。 You will notice that overall, the one without the io is the fastest.您会注意到，总体而言，没有 io 的速度最快。 Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.此外，您可以 fork 进程（或创建一个线程），一个写入文件（stdout），一个执行操作。

Answer 10

So first, don't print on every entry.所以首先，不要在每个条目上都打印。 Basically what i am saying is do not do like that.基本上我要说的是不要那样做。

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that而是在堆栈或堆上分配一个缓冲区，并将您的信息存储在那里，然后将此缓冲区扔到标准输出中，只是说

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100);

but in your case, just use write(1, temp, 9);但在你的情况下，只需使用write(1, temp, 9);

Answer 11

I am pretty sure you can increase the output performance by increasing the buffer size.我很确定您可以通过增加缓冲区大小来提高输出性能。 So you have less fwrite calls.所以你有更少的 fwrite 调用。 write might be faster but I am not sure. write 可能会更快，但我不确定。 Just try this:试试这个：

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

vs对比

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

The same applies to your code.这同样适用于您的代码。 Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.最近几天的一些测试表明，合适的缓冲区大小可能约为 1 << 12 (=4096) 和 1<<16 (=65535) 字节。

C/C++ 将多个字节发送到标准输出的最佳方式

问题描述

11 个解决方案

解决方案1
10 已采纳 2009-02-09 15:24:07

解决方案2
9 2012-04-27 17:02:56

Write 10 million records from a nine byte array从九字节数组中写入 1000 万条记录

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1 Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0 FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

解决方案3
3 2009-02-09 15:35:41

解决方案4
3 2009-02-09 15:58:33

解决方案5
1 2009-02-09 15:23:45

解决方案6
1 2009-02-09 16:14:35

解决方案7
0

解决方案8
0 2009-02-09 15:27:51

解决方案9
0 2009-02-09 16:02:22

解决方案10
0 2020-04-04 15:43:45

解决方案11
0 2020-05-06 08:47:35

C/C++ 将多个字节发送到标准输出的最佳方式

问题描述

11 个解决方案

解决方案1 10 已采纳 2009-02-09 15:24:07

解决方案2 9 2012-04-27 17:02:56

Write 10 million records from a nine byte array从九字节数组中写入 1000 万条记录

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1 Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0 FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

解决方案3 3 2009-02-09 15:35:41

解决方案4 3 2009-02-09 15:58:33

解决方案5 1 2009-02-09 15:23:45

解决方案6 1 2009-02-09 16:14:35

解决方案7 0

解决方案8 0 2009-02-09 15:27:51

解决方案9 0 2009-02-09 16:02:22

解决方案10 0 2020-04-04 15:43:45

解决方案11 0 2020-05-06 08:47:35

解决方案1
10 已采纳 2009-02-09 15:24:07

解决方案2
9 2012-04-27 17:02:56

解决方案3
3 2009-02-09 15:35:41

解决方案4
3 2009-02-09 15:58:33

解决方案5
1 2009-02-09 15:23:45

解决方案6
1 2009-02-09 16:14:35

解决方案7
0

解决方案8
0 2009-02-09 15:27:51

解决方案9
0 2009-02-09 16:02:22

解决方案10
0 2020-04-04 15:43:45

解决方案11
0 2020-05-06 08:47:35