管道传输时，C ++程序的性能更佳

Question

I haven't done any programming in a decade. 十年来我没有做过任何编程。 I wanted to get back into it, so I made this little pointless program as practice. 我想回到它，所以我把这个毫无意义的程序作为练习。 The easiest way to describe what it does is with output of my --help codeblock: 描述它的作用的最简单方法是输出我的--help codeblock：

./prng_bench --help ./prng_bench --help

./prng_bench: usage: ./prng_bench $N $B [$T]

   This program will generate an N digit base(B) random number until
all N digits are the same. 

Once a repeating N digit base(B) number is found, the following statistics are displayed:
  -Decimal value of all N digits.
  -Time & number of tries taken to randomly find.

Optionally, this process is repeated T times. 
   When running multiple repititions, averages for all N digit base(B)
numbers are displayed at the end, as well as total time and total tries.

My "problem" is that when the problem is "easy", say a 3 digit base 10 number, and I have it do a large number of passes the "total time" is less when piped to grep. 我的“问题”是当问题“容易”时，比如一个3位数的基数为10的数字，并且我做了大量的传递，当用管道传输grep时，“总时间”就越少。 ie: 即：

command ; 命令 command |grep took : 命令| grep：

./prng_bench 3 10 999999 ; ./prng_bench 3 10 999999|grep took

....
Pass# 999999: All 3 base(10) digits =  3 base(10).   Time:    0.00005 secs.   Tries: 23
It took 191.86701 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00019 secs & 99 tries was needed to find each one. 

It took 159.32355 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

If I run the same command many times w/o grep time is always VERY close. 如果我多次运行相同的命令没有grep时间总是非常接近。 I'm using srand(1234) for now, to test. 我现在正在使用srand（1234）进行测试。 The code between my calls to clock_gettime() for start and stop do not involve any stream manipulation, which would obviously affect time. 我对clock_gettime（）的启动和停止调用之间的代码不涉及任何流操作，这显然会影响时间。 I realize this is an exercise in futility, but I'd like to know why it behaves this way. 我意识到这是一种无用的练习，但我想知道它为什么会这样。 Below is heart of the program. 以下是该计划的核心。 Here's a link to the full source on DB if anybody wants to compile and test. 如果有人想编译和测试，这里是DB的完整源代码的链接。 https://www.dropbox.com/s/bczggar2pqzp9g1/prng_bench.cpp clock_gettime() requires -lrt. https://www.dropbox.com/s/bczggar2pqzp9g1/prng_bench.cpp clock_gettime（）需要-lrt。

for (int pass_num=1; pass_num<=passes; pass_num++) {   //Executes $passes # of times.
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time);  //get time
  start_time = timetodouble(temp_time);                //convert time to double, store as start_time
  for(i=1, tries=0; i!=0; tries++) {    //loops until 'comparison for' fully completes. counts reps as 'tries'.  <------------
    for (i=0; i<Ndigits; i++)      //Move forward through array.                                                              |
      results[i]=(rand()%base);    //assign random num of base to element (digit).                                            |
    /*for (i=0; i<Ndigits; i++)     //---Debug Lines---------------                                                           |
      std::cout<<" "<<results[i];   //---a LOT of output.----------                                                           |
    std::cout << "\n";              //---Comment/decoment to disable/enable.*/   //                                           |
    for (i=Ndigits-1; i>0 && results[i]==results[0]; i--); //Move through array, != element breaks & i!=0, new digits drawn. -|
  }                                                        //If all are equal i will be 0, nested for condition satisfied.  -|
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time);  //get time
  draw_time = (timetodouble(temp_time) - start_time);  //convert time to dbl, subtract start_time, set draw_time to diff.
  total_time += draw_time;    //add time for this pass to total.
  total_tries += tries;       //add tries for this pass to total.
  /*Formated output for each pass:
    Pass# ---: All -- base(--) digits = -- base(10)   Time:   ----.---- secs.    Tries: ----- (LINE) */
  std::cout<<"Pass# "<<std::setw(width_pass)<<pass_num<<": All "<<Ndigits<<" base("<<base<<") digits = "
           <<std::setw(width_base)<<results[0]<<" base(10).   Time: "<<std::setw(width_time)<<draw_time
           <<" secs.   Tries: "<<tries<<"\n";
}
if(passes==1) return 0;        //No need for totals and averages of 1 pass.
/* It took ----.---- secs & ------ tries to find --- repeating -- digit base(--) numbers. (LINE)
 An average of ---.---- secs & ---- tries was needed to find each one. (LINE)(LINE) */
 std::cout<<"It took "<<total_time<<" secs & "<<total_tries<<" tries to find "
          <<passes<<" repeating "<<Ndigits<<" digit base("<<base<<") numbers.\n"
          <<"An average of "<<total_time/passes<<" secs & "<<total_tries/passes
          <<" tries was needed to find each one. \n\n";
return 0;

Answer 1

Printing to the screen is very slow in comparison to a pipe or running without printing. 与管道相比或在没有打印的情况下运行时，打印到屏幕上的速度非常慢。 Piping to grep keeps you from doing it. 管道到grep会让你无法做到这一点。

Answer 2

It is not about printing to the screen; 它不是要打印到屏幕上; it is about the output being a terminal (tty). 它是关于输出是终端（tty）。

According to the POSIX spec : 根据POSIX规范：

When opened, the standard error stream is not fully buffered; 打开时，标准错误流未完全缓冲; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device. 当且仅当可以确定流不参考交互设备时，标准输入和标准输出流被完全缓冲。

Linux interprets this to make the FILE * (ie stdio) stdout line-buffered when the output is a tty (eg your terminal window), and block-buffered otherwise (eg your pipe). Linux解释这个使得当输出是tty（例如你的终端窗口）时， FILE * （即stdio） stdout行缓冲，否则阻塞缓冲（例如你的管道）。

The reason sync_with_stdio makes a difference is that when it is enabled, the C++ cout stream inherits this behavior. sync_with_stdio产生影响的原因是，当启用它时，C ++ cout流继承此行为。 When you set it to false , it is no longer bound by that behavior and thus becomes block buffered. 当您将其设置为false ，它不再受该行为的约束，因此变为块缓冲。

Block buffering is faster because it avoids the overhead of flushing the buffer on every newline. 块缓冲更快，因为它避免了在每个换行符上刷新缓冲区的开销。

You can further verify this by piping to cat instead of grep . 您可以通过管道来进一步验证这种cat ，而不是grep 。 The difference is the pipe itself, not the screen per se. 不同之处在于管道本身，而不是屏幕本身。

Answer 3

Thank you Collin & Nemo. 谢谢Collin＆Nemo。 I was certain that because I wasn't calling std::cout between getting start & stop times that it wouldn't have an effect. 我确信，因为我没有在开始和停止时间之间调用std :: cout它不会产生影响。 Not so. 不是这样。 I think this is due to optimizations that the compiler performs even with -O0 or 'defaults'. 我认为这是由于编译器甚至使用-O0或'defaults'执行的优化。

What I think is happening...? 我认为发生了什么......？ I think that as Collin suggested, the compiler is trying to be clever about when it writes to the TTY. 我认为，正如Collin建议的那样，编译器试图在写入TTY时变得聪明。 And, as Nemo pointed out, cout inherits the line buffered properties of stdio. 并且，正如Nemo指出的那样，cout继承了stdio的行缓冲属性。

I'm able to reduce the effect, but not eliminate, by using: 我可以通过使用以下方式减少效果，但不能消除：

std::cout.sync_with_stdio(false);

From my limited reading on this, it should be called before any output operations are done. 从我对此的有限阅读中，应该在任何输出操作完成之前调用它。 Here's source for no_sync version: https://www.dropbox.com/s/wugo7hxvu9ao8i3/prng_bench_no_sync.cpp 这是no_sync版本的来源： https ：//www.dropbox.com/s/wugo7hxvu9ao8i3/prng_bench_no_sync.cpp

./no_sync 3 10 999999;./no_sync 3 10 999999|grep took ./no_sync 3 10 999999; ./ no_sync 3 10 999999 | grep take

Compiled with -O0 用-O0编译

999999: All 3 base(10) digits =  3 base(10)  Time:    0.00004 secs.  Tries: 23
It took 166.30801 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00017 secs & 99 tries was needed to find each one. 

It took 163.72914 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

Complied with -O3 符合-O3

999999: All 3 base(10) digits =  3 base(10)  Time:    0.00003 secs.  Tries: 23
It took 143.23234 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00014 secs & 99 tries was needed to find each one. 

It took 140.36195 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

Specifying not to sync with stdio changed my delta between piped and non-piped from over 30 seconds to less than 3 . 指定不与stdio同步将管道和非管道之间的增量从30秒更改为小于3秒 。 See original question for original delta it was ~191 - ~160 看原始delta的原始问题是~191 - ~160

To further test I created another version using a struct to store stats about each pass. 为了进一步测试我使用struct创建了另一个版本来存储关于每个传递的统计信息。 This method does all output after all passes are complete. 所有传递完成后，此方法将完成所有输出。 I want to emphasize that this is probably a terrible idea . 我想强调，这可能是一个糟糕的主意 。 I'm allowing a command line argument to determine the size of a dynamically allocated array of structs containing an int, double and unsigned long. 我允许命令行参数来确定包含int，double和unsigned long的动态分配的结构数组的大小。 I can't even run this version with 999,999 passes. 我甚至无法以999,999次通过运行此版本。 I get a segmentation fault. 我遇到了分段错误。 https://www.dropbox.com/s/785ntsm622q9mwd/prng_bench_struct.cpp https://www.dropbox.com/s/785ntsm622q9mwd/prng_bench_struct.cpp

./struct_prng 3 10 99999;./struct_prng 3 10 99999|grep took ./struct_prng 3 10 99999; ./ struct_prng 3 10 99999 | grep take

Pass# 99999: All 3 base(10) digits =  6 base(10)  Time:    0.00025 secs.  Tries: 193
It took 13.10071 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.
An average of 0.00013 secs & 99 tries was needed to find each one. 

It took 13.12466 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.

What I've learned from this is that you can't count on the order you've coded things being the order they're executed in. In future programs I'll probably implement getopt instead of writing my own parse_args function. 我从中学到的是，你不能指望你编写的东西是他们执行的顺序。在未来的程序中，我可能会实现getopt而不是编写我自己的parse_args函数。 This would allow me to surpress extraneous output on high repetition loops, by requiring users to use the -v switch if they want to see it. 这样我就可以通过要求用户在想要看到它时使用-v开关来抑制高重复循环中的无关输出。

I hope the further testing proves useful to anybody wondering about piping and output in loops. 我希望进一步的测试对任何想知道循环中的管道和输出的人都有用。 All of the results I've posted were obtained on a RasPi. 我发布的所有结果都是在RasPi上获得的。 All of the source codes linked are GPL, just because that's the first license I could think of... I really have no self-aggrandizing need for the copyleft provisions of the GPL, I just wanted to be clear that it's free, but without warranty or liability. 链接的所有源代码都是GPL，只是因为这是我能想到的第一个许可证......我真的没有对GPL的copyleft规定的自我扩展需求，我只想清楚它是免费的，但没有保修或责任。

Note that all of the sources linked have the call to srand(...) commented out, so all of your pseudo-random results will be exactly the same. 请注意，链接的所有源都会调用srand（...），因此所有伪随机结果都将完全相同。

管道传输时，C ++程序的性能更佳

问题描述

3 个解决方案

解决方案1
5 已采纳 2012-09-05 23:54:58

解决方案2
2 2012-09-07 03:36:06

解决方案3
0 2012-09-07 02:22:35

管道传输时，C ++程序的性能更佳

问题描述

3 个解决方案

解决方案1 5 已采纳 2012-09-05 23:54:58

解决方案2 2 2012-09-07 03:36:06

解决方案3 0 2012-09-07 02:22:35

解决方案1
5 已采纳 2012-09-05 23:54:58

解决方案2
2 2012-09-07 03:36:06

解决方案3
0 2012-09-07 02:22:35