简体   繁体   English

直接从FILE缓冲区读取

[英]Reading directly from a FILE buffer

The core of my app looks approximately as follows: 我的应用程序的核心大致如下所示:

size_t bufsize;
char*  buf1;
size_t r1; 
FILE* f1=fopen("/path/to/file","rb");
...
do{
  r1=fread(buf1, 1, bufsize, f1);
  processChunk(buf1,r1);
} while (!feof(f1));
...

(In reality, I have multiple FILE* 's and multiple bufN 's.) Now, I hear that FILE is quite ready to manage a buffer (referred to as a "stream buffer" ) all by itself, and this behavior appears to be quite tweakable: https://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering . (实际上,我有多个FILE*和多个bufN 。)现在,我听说FILE已经准备好完全自己管理缓冲区(称为“流缓冲区” ),并且此行为似乎相当可调整: https ://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering。

How can I refactor the above piece of code to ditch the buf1 buffer and use f1 's internal stream buffer instead (while setting it to bufsize )? 如何重构上面的代码以buf1缓冲区并改用f1的内部流缓冲区(同时将其设置为bufsize )?

If you don't want opaquely buffered I/O, don't use FILE * . 如果您不希望使用不透明的缓冲I / O,请不要使用FILE * Use lower-level APIs that let you manage all the application-side buffering yourself, such as plain POSIX open() and read() for instance. 使用低级API,让您自己管理所有应用程序端缓冲,例如普通的POSIX open()read()

So I've read a little bit of the C standard and run some benchmarks and here are my findings: 因此,我阅读了一些C标准并运行了一些基准测试,这是我的发现:

1) Doing it as in the above example does involve unnecessary in-memory copying, which increases the user time of simple cmp program based on the above example about twice. 1)像上面的示例中那样做确实涉及不必要的内存中复制,这使基于上面的示例的简单cmp程序的用户时间增加了大约两倍。 Nevertheless user-time is insignificant for most IO-heavy programs, unless the source of the file is extremely fast. 但是,对于大多数IO繁重的程序而言,用户时间微不足道,除非文件源非常快。 On in-memory file-sources ( /dev/shm on Linux), however, turning off FILE buffering ( setvbuf(f1, NULL, _IONBF, 0); ) does yield a nice and consistent speed increase of about 10–15% on my machine when using buffsizes close to BUFSIZ (again, measured on the IO-heavy cmp utility based on the above snippet, which I've already mentioned, which I've tested on 2 identical 700MB files 100 times). 但是,在内存文件源(Linux上为/dev/shm )上,关闭FILE缓冲( setvbuf(f1, NULL, _IONBF, 0); )确实会产生约10%至15%的良好且一致的速度提高。我的机器在使用接近BUFSIZ buffsize时(再次,基于上面提到的代码段,是在IO密集型cmp实用程序上测量的,我已经提到过,我已经对2个相同的700MB文件进行了1​​00次测试)。

2) Whereas there is an API for setting the FILE buffer, I haven't found any standardized API for reading it, so I'm going to stick with the true and tested way of doing, but with the FILE buffer off ( setvbuf(f1, NULL, _IONBF, 0); ) 2)尽管有一个用于设置FILE缓冲区的API,但我没有找到任何标准化的API来读取它,所以我将坚持使用经过验证的真实方法,但是将FILE缓冲区关闭( setvbuf(f1, NULL, _IONBF, 0);

(But I guess I could solve my question by setting my own buffer as the FILE stream buffer with the _IONBF mode option (=turn off buffering), and then I could just access it via some unstandardized pointer in the FILE struct.) (但是我想我可以通过使用_IONBF模式选项(=关闭缓冲)将自己的缓冲区设置为FILE流缓冲区来解决我的问题,然后我可以通过FILE结构中的一些非标准化指针来访问它。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM