在C ++中读取大型txt文件

Question

I'd like to read a file of about 5MB in memory ... the file has this format (it is a text file) 我想读取内存中大约5MB的文件...该文件具有这种格式（它是一个文本文件）

ID 3:  0 itemId.1 0 itemId.2 0 itemId.5 1 itemId.7 ........................ 20 itemId.500
ID 50:  0 itemId.31 0 itemId.2 0 itemId.4 2 itemId.70 ........................ 20 itemId.2120
.....

how can I do this efficiently in c++? 如何在C ++中有效地做到这一点？

Answer 1

Reading a file line by line: 逐行读取文件：

ifstream fin ("file.txt");
string     myStr;

while(getline(fin, myStr))   // Always put the read in the while condition.
{                            // Then you only enter the loop if there is data to
    //use myStr data         // processes. Otherwise you need to read and then
}                            //  test if the read was OK
                             //
                             // Note: The last line read will read up to (but not
                             //        past) then end of file. Thus When there is
                             //        no data left in the file its state is still
                             //        OK. It is not until you try and explicitly
                             //        read past the end of file that EOF flag is set.

For a reason to not explicitly call close see: 由于未明确调用close的原因，请参见：
https://codereview.stackexchange.com/questions/540/my-c-code-involving-an-fstream-failed-review/544#544 https://codereview.stackexchange.com/questions/540/my-c-code-involving-an-fstream-failed-review/544#544

If efficiency is your major goal (its probably not). 如果效率是您的主要目标（可能不是）。 Then read the whole file into memory and parse from there: see Thomas below: Read large txt file in c++ 然后将整个文件读入内存并从那里进行解析：请参见下面的Thomas：用c ++读取大型txt文件

Answer 2

Read the entire file into memory, then process the contents in memory. 将整个文件读入内存，然后处理内存中的内容。

A file resource (eg hard drive) is most efficient when the motor is kept spinning. 电动机保持旋转状态时，文件资源（例如，硬盘驱动器）效率最高。 So one large read of data is more efficient than 5 reads of small quantities of data. 因此，一次大数据读取要比5次小数据读取更有效率。

On most platforms, memory is faster to access than a file. 在大多数平台上，内存访问比文件访问更快。 Using this information, one can make a program more efficient by reading data into memory then processing the memory. 使用此信息，可以通过将数据读入内存然后处理内存来提高程序的效率。

Combining the two techniques will yield a greater performance: read as much data, in one transaction, into memory then process the memory. 将这两种技术结合在一起将产生更高的性能：一次交易将尽可能多的数据读入内存，然后处理内存。

Some people declare large arrays of char , or unsigned char (for binary data). 有些人声明了char或unsigned char （用于二进制数据）的大型数组。 Other people tell std::string or std::vector to reserve a large amount of memory, then read the data into the data structure. 其他人告诉std :: string或std :: vector保留大量内存，然后将数据读入数据结构。

Also, block reads (a.ka. istream::read() ) will bypass most of the slow parts of the C++ stream facilities. 另外，块读取（也称为istream::read() ）将绕过C ++流功能的大多数慢速部分。

Answer 3

Use a file stream : 使用文件流：

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main() {
    string line;
    ifstream myfile ("example.txt");
    if (myfile.is_open())
    {
        while ( getline(myfile, line) )
            cout << line << endl;

        myfile.close();
    }
    else 
    {
        cout << "Unable to open file"; 
    }

    return 0;
}

5MB really is not a large file. 5MB确实不是一个大文件。 The stream will take care of reading chunks at a time for you, but really; 流将为您一次读取大块数据，但实际上是这样； almost any machine this runs on will likely be able to read 5MB right into memory no problem. 几乎所有在其上运行的计算机都将能够将5MB的数据正确地读入内存中。

在C ++中读取大型txt文件

问题描述

3 个解决方案

解决方案1
5 已采纳 2011-10-20 20:45:51

解决方案2
4 2011-10-20 20:58:23

解决方案3
3 2011-10-20 20:37:13

在C ++中读取大型txt文件

问题描述

3 个解决方案

解决方案1 5 已采纳 2011-10-20 20:45:51

解决方案2 4 2011-10-20 20:58:23

解决方案3 3 2011-10-20 20:37:13

解决方案1
5 已采纳 2011-10-20 20:45:51

解决方案2
4 2011-10-20 20:58:23

解决方案3
3 2011-10-20 20:37:13