简体繁体 English

std :: ifstream明显慢于FILE吗？

[英]Is std::ifstream significantly slower than FILE?

原文 2009-01-25 06:03:15 9 6 c++/ optimization/ file-io/ ifstream

I've been informed that my library is slower than it should be, on the order of 30+ times too slow parsing a particular file (text file, size 326 kb). 我被告知我的库比它应该慢，大约30+次解析特定文件（文本文件，大小326 kb）太慢。 The user suggested that it may be that I'm using std::ifstream (presumably instead of FILE ). 用户建议我可能正在使用std::ifstream （大概是代替FILE ）。

I'd rather not blindly rewrite, so I thought I'd check here first, since my guess would be the bottleneck is elsewhere. 我宁愿不盲目改写，所以我想我先在这里查看，因为我的猜测是其他地方的瓶颈。 I'm reading character by character, so the only functions I'm using are get() , peek() , and tellg()/seekg() . 我正tellg()/seekg()阅读，所以我使用的唯一函数是get() ， peek()和tellg()/seekg() 。

Update: 更新：

I profiled, and got confusing output - gprof didn't appear to think that it took so long. 我描述了，并且输出令人困惑 - gprof似乎并没有想到花了这么长时间。 I rewrote the program to read the entire file into a buffer first, and it sped up by about 100x. 我重写了程序，首先将整个文件读入缓冲区，然后加速大约100倍。 I think the problem may have been the tellg()/seekg() that took a long time, but gprof may have been unable to see that for some reason. 我认为问题可能是花了很长时间的tellg()/seekg() ，但gprof可能由于某种原因无法看到它。 In any case, ifstream does not appear to buffer the entire file, even for this size. 在任何情况下， ifstream似乎都不会缓冲整个文件，即使是这个大小。

6 个解决方案

I don't think that'd make a difference. 我不认为这会有所作为。 Especially if you're reading char by char, the overhead of I/O is likely to completely dominate anything else. 特别是如果你通过char读取char，I / O的开销很可能完全支配其他任何东西。 Why do you read single bytes at a time? 为什么一次读取单个字节？ You know how extremely inefficient it is? 你知道这是多么低效吗？

On a 326kb file, the fastest solution will most likely be to just read it into memory at once. 在326kb文件上，最快的解决方案很可能是立即将其读入内存。

The difference between std::ifstream and the C equivalents, is basically a virtual function call or two. std :: ifstream和C等价物之间的区别基本上是一个或两个虚函数调用。 It may make a difference if executed a few tens of million times per second, otherwise, not reall. 如果每秒执行几十万次，它可能会有所不同，否则，不会重新进行。 file I/O is generally so slow that the API used to access it doesn't really matter. 文件I / O通常很慢，用于访问它的API并不重要。 What matters far more is the read/write pattern. 更重要的是读/写模式。 Lots of seeks are bad, sequential reads/writes good. 很多寻求都是糟糕的，顺序读/写是好的。

It should be slightly slower, but like what you said, it might not be the bottleneck. 它应该稍慢，但就像你说的那样，它可能不是瓶颈。 Why don't you profile your program and see if that's the case? 你为什么不描述你的程序，看看是否是这种情况？

I thinks that is unlikely your problem will be fixed by switching from fstream to FILE*, usually both are buffered by the C library. 我认为通过从fstream切换到FILE *不太可能解决你的问题，通常两者都由C库缓冲。 Also the OS can cache reads (linux is very good in that aspect). OS也可以缓存读取（linux在这方面非常好）。 Given the size of the file you are accessing is pretty likely it will be entirely in RAM. 鉴于您访问的文件大小很可能完全在RAM中。

Like PolyThinker say your best bet is to run your program trough an profiler an determine where the problem is. 像PolyThinker一样，你最好的选择是通过一个分析器运行你的程序，以确定问题所在。

Also you are using seekg/tellg this can cause notable delays if your disk is heavily fragmented, because to read the file for the first time the disk have to move the heads to the correct position. 你也在使用seekg / tellg这可能会导致显着的延迟，如果你的磁盘碎片严重，因为第一次读取文件时磁盘必须将磁头移动到正确的位置。

All benchmarks are evil. 所有基准都是邪恶的。 Just profile your code for the data you expect. 只需为您期望的数据分析代码。

I performed an I/O performance comparison between Ruby, Python, Perl, C++ once. 我曾经在Ruby，Python，Perl，C ++之间进行过I / O性能比较。 For my data, languages versions, etc C++'s variant was several times slower (it was a big suprise at that time). 对于我的数据，语言版本等，C ++的变体速度要慢几倍（当时这是一个很大的惊喜）。

I agree that you should profile. 我同意你应该介绍一下。 But if you're reading the file a character at a time, how about creating a memory-mapped file? 但是，如果您一次只读取一个字符，那么创建内存映射文件怎么样？ That way you can treat the file like an array of characters, and the OS should take care of all the low-level buffering for you. 这样你可以将文件视为一个字符数组，操作系统应该为你处理所有的低级缓冲。 The simplest and probably fastest solution is a win in my book. 最简单也可能是最快的解决方案是在我的书中获胜。 :) :)

Here is an excellent benchmark which shows that under extreme conditions, fstream s are actually quite slow... unless: 这是一个很好的基准，它表明在极端条件下， fstream实际上非常慢......除非：

You use buffering (I cannot stress that enough) 你使用缓冲（我不能强调这一点）
You manipulate the buffer yourself (that is, if you need performance such as OP in the linked question), which is not so different from using FILE* . 您自己操纵缓冲区（也就是说，如果您需要在链接问题中使用OP等性能），这与使用FILE*没有太大区别。

You shouldn't optimize prematurely, though. 但是，你不应该过早地进行优化。 fstreams are generally better, and if you need to optimize them down in the road, you can always do it later with little cost. fstreams通常更好，如果你需要在路上优化它们，你可以随时以fstreams成本完成它。 In order to prepare for the worst in advance, I suggest creating a minimal proxy for fstream now so that you can optimize it later, without need to touch anything else. 为了提前做好最坏的准备，我建议现在为fstream创建一个最小代理，以便您以后可以优化它，而无需触及任何其他内容。