直接从FILE缓冲区读取

Question

The core of my app looks approximately as follows: 我的应用程序的核心大致如下所示：

size_t bufsize;
char*  buf1;
size_t r1; 
FILE* f1=fopen("/path/to/file","rb");
...
do{
  r1=fread(buf1, 1, bufsize, f1);
  processChunk(buf1,r1);
} while (!feof(f1));
...

(In reality, I have multiple FILE* 's and multiple bufN 's.) Now, I hear that FILE is quite ready to manage a buffer (referred to as a "stream buffer" ) all by itself, and this behavior appears to be quite tweakable: https://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering . （实际上，我有多个FILE*和多个bufN 。）现在，我听说FILE已经准备好完全自己管理缓冲区（称为“流缓冲区” ），并且此行为似乎相当可调整： https ://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering。

How can I refactor the above piece of code to ditch the buf1 buffer and use f1 's internal stream buffer instead (while setting it to bufsize )? 如何重构上面的代码以buf1缓冲区并改用f1的内部流缓冲区（同时将其设置为bufsize ）？

Answer 1

If you don't want opaquely buffered I/O, don't use FILE * . 如果您不希望使用不透明的缓冲I / O，请不要使用FILE * 。 Use lower-level APIs that let you manage all the application-side buffering yourself, such as plain POSIX open() and read() for instance. 使用低级API，让您自己管理所有应用程序端缓冲，例如普通的POSIX open()和read() 。

Answer 2

So I've read a little bit of the C standard and run some benchmarks and here are my findings: 因此，我阅读了一些C标准并运行了一些基准测试，这是我的发现：

1) Doing it as in the above example does involve unnecessary in-memory copying, which increases the user time of simple cmp program based on the above example about twice. 1）像上面的示例中那样做确实涉及不必要的内存中复制，这使基于上面的示例的简单cmp程序的用户时间增加了大约两倍。 Nevertheless user-time is insignificant for most IO-heavy programs, unless the source of the file is extremely fast. 但是，对于大多数IO繁重的程序而言，用户时间微不足道，除非文件源非常快。 On in-memory file-sources ( /dev/shm on Linux), however, turning off FILE buffering ( setvbuf(f1, NULL, _IONBF, 0); ) does yield a nice and consistent speed increase of about 10–15% on my machine when using buffsizes close to BUFSIZ (again, measured on the IO-heavy cmp utility based on the above snippet, which I've already mentioned, which I've tested on 2 identical 700MB files 100 times). 但是，在内存文件源（Linux上为/dev/shm ）上，关闭FILE缓冲（ setvbuf(f1, NULL, _IONBF, 0); ）确实会产生约10％至15％的良好且一致的速度提高。我的机器在使用接近BUFSIZ buffsize时（再次，基于上面提到的代码段，是在IO密集型cmp实用程序上测量的，我已经提到过，我已经对2个相同的700MB文件进行了100次测试）。

2) Whereas there is an API for setting the FILE buffer, I haven't found any standardized API for reading it, so I'm going to stick with the true and tested way of doing, but with the FILE buffer off ( setvbuf(f1, NULL, _IONBF, 0); ) 2）尽管有一个用于设置FILE缓冲区的API，但我没有找到任何标准化的API来读取它，所以我将坚持使用经过验证的真实方法，但是将FILE缓冲区关闭（ setvbuf(f1, NULL, _IONBF, 0); ）

(But I guess I could solve my question by setting my own buffer as the FILE stream buffer with the _IONBF mode option (=turn off buffering), and then I could just access it via some unstandardized pointer in the FILE struct.) （但是我想我可以通过使用_IONBF模式选项（=关闭缓冲）将自己的缓冲区设置为FILE流缓冲区来解决我的问题，然后我可以通过FILE结构中的一些非标准化指针来访问它。）

直接从FILE缓冲区读取

问题描述

2 个解决方案

解决方案1
1 2014-02-03 12:23:04

解决方案2
0 2014-02-03 16:06:00

直接从FILE缓冲区读取

问题描述

2 个解决方案

解决方案1 1 2014-02-03 12:23:04

解决方案2 0 2014-02-03 16:06:00

解决方案1
1 2014-02-03 12:23:04

解决方案2
0 2014-02-03 16:06:00