简体   繁体   English

std::fstream 缓冲 vs 手动缓冲(为什么手动缓冲增益 10 倍)?

[英]std::fstream buffering vs manual buffering (why 10x gain with manual buffering)?

I have tested two writing configurations:我测试了两种写入配置:

  1. Fstream buffering: Fstream 缓冲:

     // Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream; stream.rdbuf()->pubsetbuf(buffer, length); stream.open("test.dat", std::ios::binary | std::ios::trunc) // To write I use : stream.write(reinterpret_cast<char*>(&x), sizeof(x));
  2. Manual buffering:手动缓冲:

     // Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream("test.dat", std::ios::binary | std::ios::trunc); // Then I put manually the data in the buffer // To write I use : stream.write(buffer, length);

I expected the same result...我期待同样的结果......

But my manual buffering improve performance by a factor of 10 to write a file of 100MB, and the fstream buffering does not change anything compared to the normal situation (without redefining a buffer).但是我的手动缓冲将性能提高了 10 倍以写入 100MB 的文件,并且 fstream 缓冲与正常情况相比没有任何改变(无需重新定义缓冲区)。

Does someone has an explanation of this situation ?有人对这种情况有解释吗?

EDIT : Here are the news : a benchmark just done on a supercomputer (linux 64-bit architecture, lasts intel Xeon 8-core, Lustre filesystem and ... hopefully well configured compilers)编辑:这是新闻:刚刚在超级计算机上完成的基准测试(linux 64 位架构,持续英特尔至强 8 核,Lustre 文件系统和...希望配置良好的编译器)基准 (and I don't explain the reason of the "resonance" for a 1kB manual buffer...) (我没有解释 1kB 手动缓冲区“共振”的原因......)

EDIT 2 : And the resonance at 1024 B (if someone has an idea about that, I'm interested) :编辑 2:1024 B 处的共振(如果有人对此有想法,我很感兴趣):在此处输入图片说明

This is basically due to function call overhead and indirection.这基本上是由于函数调用开销和间接性。 The ofstream::write() method is inherited from ostream. ofstream::write() 方法继承自 ostream。 That function is not inlined in libstdc++, which is the first source of overhead.该函数未在 libstdc++ 中内联,这是第一个开销来源。 Then ostream::write() has to call rdbuf()->sputn() to do the actual writing, which is a virtual function call.然后 ostream::write() 必须调用 rdbuf()->sputn() 来做实际的写入,这是一个虚函数调用。

On top of that, libstdc++ redirects sputn() to another virtual function xsputn() which adds another virtual function call.最重要的是,libstdc++ 将 sputn() 重定向到另一个虚函数 xsputn(),它添加了另一个虚函数调用。

If you put the characters into the buffer yourself, you can avoid that overhead.如果您自己将字符放入缓冲区,则可以避免这种开销。

I would like to explain what is the cause of the peak in the second chart .我想解释一下第二张图表中出现峰值的原因是什么。

In fact, virtual functions used by std::ofstream lead to a performance decrement similar to what we see on the first picture, but it does not give an answer to why the highest performance hit when manual buffer size was less than 1024 bytes .事实上, std::ofstream使用的虚函数导致性能下降,类似于我们在第一张图片中看到的,但它没有给出为什么手动缓冲区大小小于 1024 字节时性能最高的答案。

The problem relates to the high cost of writev() and write() system call and internal implementation of std::filebuf internal class of std::ofstream .该问题与writev()write()系统调用的writev()以及std::ofstreamstd::filebuf内部类的内部实现有关。

To show how write() influences the performance, I did a simple test using the dd tool on my Linux machine to copy 10MB file with different buffer sizes ( bs option):为了展示write()如何影响性能,我在我的 Linux 机器上使用dd工具进行了一个简单的测试,以复制具有不同缓冲区大小( bs选项)的 10MB 文件:

test@test$ time dd if=/dev/zero of=zero bs=256 count=40000
40000+0 records in
40000+0 records out
10240000 bytes (10 MB) copied, 2.36589 s, 4.3 MB/s

real    0m2.370s
user    0m0.000s
sys     0m0.952s
test$test: time dd if=/dev/zero of=zero bs=512 count=20000
20000+0 records in
20000+0 records out
10240000 bytes (10 MB) copied, 1.31708 s, 7.8 MB/s

real    0m1.324s
user    0m0.000s
sys     0m0.476s

test@test: time dd if=/dev/zero of=zero bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.792634 s, 12.9 MB/s

real    0m0.798s
user    0m0.008s
sys     0m0.236s

test@test: time dd if=/dev/zero of=zero bs=4096 count=2500
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.274074 s, 37.4 MB/s

real    0m0.293s
user    0m0.000s
sys     0m0.064s

As you can see: the smaller the buffer is, the lower the write speed is and therefore the more the time dd spends in the system space .如您所见:缓冲区越小,写入速度越低,因此dd在系统空间中花费的时间越多 So, the read/write speed decreases when the buffer size decreases.因此,当缓冲区大小减小时,读/写速度会降低。

But why did the speed peak when the manual buffer size was less than 1024 bytes in the topic creator manual buffer tests ?但是为什么在主题创建者手动缓冲区测试中手动缓冲区大小小于 1024 字节时速度会达到峰值 Why it was almost constant ?为什么它几乎是恒定的

The explanation relates to the std::ofstream implementation, especially to the std::basic_filebuf .解释与std::ofstream实现有关,尤其是与std::basic_filebuf

By default it uses 1024 bytes buffer (BUFSIZ variable).默认情况下,它使用 1024 字节缓冲区(BUFSIZ 变量)。 So, when you write your data using pieces less than 1024, writev() (not write() ) system call is called at least once for two ofstream::write() operations (pieces have a size of 1023 < 1024 - first is written to the buffer, and second forces writing of first and second).因此,当您使用小于 1024 的片段写入数据时,对于两个ofstream::write()操作(片段的大小为 1023 < 1024 - 首先是调用writev() (而不是write() )系统调用至少一次写入缓冲区,第二个强制写入第一个和第二个)。 Based on it, we can conclude that ofstream::write() speed does not depend on the manual buffer size before the peak ( write() is called at least twice rarely).基于它,我们可以得出结论ofstream::write()速度不依赖于峰值之前的手动缓冲区大小( write()很少被调用至少两次)。

When you try writing greater or equal to 1024 bytes buffer at once using ofstream::write() call, writev() system call is called for each ofstream::write .当您尝试使用ofstream::write()调用一次写入大于或等于 1024 字节的缓冲区时, writev()为每个ofstream::write调用writev()系统调用。 So, you see that speed increases when the manual buffer is greater than 1024 (after the peak).因此,您会看到当手动缓冲区大于 1024 时(峰值之后)速度会增加。

Moreover, if you would like to set std::ofstream buffer greater than 1024 buffer (for example, 8192 bytes buffer) using streambuf::pubsetbuf() and call ostream::write() to write data using pieces of 1024 size, you would be surprised that the write speed will be the same as if you would use 1024 buffer.此外,如果您想使用streambuf::pubsetbuf()std::ofstream缓冲区设置为大于 1024 缓冲区(例如,8192 字节缓冲区streambuf::pubsetbuf()并调用ostream::write()以使用 1024 大小的块写入数据,您会惊讶于写入速度将与使用 1024 缓冲区相同。 It is because implementation of std::basic_filebuf - the internal class of std::ofstream - is hard coded to force calling system writev() call for each ofstream::write() call when passed buffer is greater or equal to 1024 bytes (see basic_filebuf::xsputn() source code).这是因为std::basic_filebuf - std::ofstream的内部类 - 被硬编码为当传递的缓冲区大于或等于 1024 字节强制调用系统writev()调用对每个ofstream::write()调用(请参阅basic_filebuf::xsputn()源代码)。 There is also an open issue in the GCC bugzilla which was reported at 2014-11-05 .2014-11-05报告的 GCC bugzilla 中还有一个未解决的问题。

So, the solution of this problem can be provided using two possible cases:因此,可以使用两种可能的情况来提供此问题的解决方案:

  • replace std::filebuf by your own class and redefine std::ofstream用你自己的类替换std::filebuf并重新定义std::ofstream
  • devide a buffer, which has to be passed to the ofstream::write() , into pieces of size less than 1024 and pass them to the ofstream::write() one by one将必须传递给ofstream::write()的缓冲区分成大小小于 1024 的片段,然后将它们一一传递给ofstream::write()
  • don't pass small pieces of data to the ofstream::write() to avoid decreasing performance on the virtual functions of std::ofstream不要将小块数据传递给ofstream::write()以避免降低std::ofstream虚函数的性能

I'd like to add to the existing responses that this performance behavior (all the overhead from the virtual method calls/indirection) is typically not an issue if writing large blocks of data.我想添加到现有的响应中,如果写入大块数据,这种性能行为(来自虚拟方法调用/间接的所有开销)通常不是问题。 What seems to have been omitted from the question and these prior answers (although probably implicitly understood) is that the original code was writing a small number of bytes each time.问题和这些先前的答案(尽管可能隐含地理解)似乎被忽略的是原始代码每次都写入少量字节。 Just to clarify for others: if you are writing large blocks of data (~kB+), there is no reason to expect manually buffering will have a significant performance difference to using std::fstream 's buffering.只是为了向其他人澄清:如果您正在写入大数据块 (~kB+),则没有理由期望手动缓冲与使用std::fstream的缓冲会有显着的性能差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM