简体   繁体   English

如何在C ++中读取/写入大文件时减少I / O磁盘访问次数

[英]How to reduce the number of I/O disk accesses while reading/writing a large file in C++

I would like to read a large file that has a structure similar to the following: 我想读取一个具有类似于以下结构的大文件:

        John  10  department
        Hello 14   kjezlkjzlkj
        jhfekh 144 lkjzlkjrzlj
        ........

The problem is I want to minimize the number of I/O access to the disk while reading this file in C++. 问题是在C ++中读取此文件时,我想减少对磁盘的I / O访问次数。 Is there a way to access the file on Disk, then read a large portion of the file to memory ( that 1 disk access), then read a second large portion of the file ( 2nd disk access...Etc)? 有没有一种方法可以访问磁盘上的文件,然后将文件的很大一部分读取到内存(第一个磁盘访问),然后再读取文件的第二个很大部分(第二个磁盘访问...等等)?

Any help will be appreciated. 任何帮助将不胜感激。

Just create a large buffer and fill it up with one read. 只需创建一个大缓冲区并填充一次读取即可。 Repeat if necessary. 如有必要,请重复。

The streams (stdio) implements this. 流(stdio)实现了这一点。 You can use fopen and then use setbuffer 您可以使用fopen然后使用setbuffer

EDIT 编辑

It is rather simple 很简单

   /* 5MB - Can increase or decrease this to your hearts content */
   #define BUFFER_SIZE 5242880

   char buffer[BUFFER_SIZE];
   file = fopen("filename", "r");
   setbuffer(file, buffer, BUFFER_SIZE);

Then use any of the operations to read fscanf , fgets etc. 然后使用任何操作读取fscanffgets等。

EDIT 编辑

Sorry did not notice it was C++ 抱歉没有注意到这是C ++

Here is the code for C++ 这是C ++的代码

#include <iostream>
#include <fstream>
using namespace std;

...

const int BUFFER_SIZE = 5242880;

filebuf fb;
char buffer[BUFFER_SIZE];
fb.setbuf(buffer, BUFFER_SIZE);
fb.open ("test.txt",ios::in);
istream is(&fb);

Then can use int i; is >> i 然后可以使用int i; is >> i int i; is >> i

etc 等等

Happy now Tino Didriksen 现在开心了Tino Didriksen

In a C++ iostream, you can increase the buffer with rdbuf and pubsetbuf 在C ++ iostream中,可以使用rdbufpubsetbuf增加缓冲区

ifstream f;
char buf[4096];
f.rdbuf()->pubsetbuf(buf, sizeof(buf));

It depends upon the operating system. 这取决于操作系统。 First, you may want to use large buffers. 首先,您可能要使用大缓冲区。 See this question . 看到这个问题 (And it also depends if the reading is sequential). (这也取决于读数是否是连续的)。

Or you could use lower-level system calls, like mmap on Linux or Posix. 或者,您可以使用较低级别的系统调用,例如Linux或Posix上的mmap (or at least, read with large megabyte sized buffers). (或至少使用大兆字节大小的缓冲区read )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM