简体   繁体   English

如何从大文本文件读取到数组C ++

[英]how to read from a large text file to an array c++

I'm trying to read from a very large text file with two columns, it's a web graph, something like this: (except it has 40 million rows). 我正在尝试从具有两列的超大文本文件中读取数据,这是一个网络图形,类似于:(除了它有4000万行)。

1 2 1 2

1 3 1 3

2 1 ... 2 1 ...

so i wanted to read from the txt file to myArray[mysize][2], and i used the code: 所以我想从txt文件读取到myArray [mysize] [2],然后使用了以下代码:

ifstream file("web-graph.txt");
if(file.is_open())
{
    for(int i = 0; i < mysize; i++)
    {
        file >> myArray[i][0];          
        file >> myArray[i][1];
    }
}

the problem is that it takes a long time to read such a big file. 问题是读取这么大的文件需要很长时间。 so is there any other way to read from the file that doesn't take this much time? 因此,有没有其他方法可以从文件中读取大量时间呢?

Yes, possibly, subject to profiling, but you won't like the answer. 是的,可能需要进行概要分析,但是您不喜欢答案。 If you make the file smaller it might be quicker to read. 如果将文件缩小,则可能会更快地读取。 How? 怎么样? Save it as binary, rather than text. 将其另存为二进制文件,而不是文本文件。 Be aware this will stop you being able to use the nice high level streaming operators. 请注意,这将使您无法使用漂亮的高级流运算符。 You will have to use lower level things instead, which might give you even more speedup. 您将不得不使用较低级别的东西,这可能会提高您的速度。
It might be better if you ask yourself why you are reading the whole file into memory. 如果问自己为什么将整个文件读入内存,可能会更好。 Again, if you made the file binary you could seek to the specific lines you are after. 同样,如果将文件制成二进制文件,则可以seek到的特定行。
If you are performing a calculation on the file, perhaps you can process it as you go, or in chunks. 如果要对文件执行计算,也许可以随时进行处理,也可以分块进行处理。

Yes, you're definitely doing it the slow (but pretty) way. 是的,您肯定会以缓慢(但很漂亮)的方式进行操作。 You have 2 options to be faster: 您有2个选择可以更快:

if( you have enough memory ) { Read the entire file into memory and, and then parse the file } 如果(您有足够的内存){ 将整个文件读入内存, 然后解析该文件 }

else { Read large chunks of the file at a time into memory, and then parse the file } 否则{ 一次将文件的大块读入内存, 然后解析文件 }

Either way, the loading looks something like this... 无论哪种方式,加载看起来都是这样的...

std::ifstream is(filename);
is.seekg(0, std::ios::end);
auto length = is.tellg();

std::string buffer;

if(length > 0)
{
    buffer.resize(static_cast<std::string::size_type>(length));
    is.seekg(0);
    is.read(&buffer.front(), length);
}

And then you would put it in a stringstream... 然后将其放入字符串流中...

std::stringstream ss(buffer);

and parse it, potentially exactly how you were doing it before... 并解析它,可能确切地说是您之前的操作方式...

for(int i = 0; i < mysize; i++)
{
    ss >> myArray[i][0];          
    ss >> myArray[i][1];
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM