简体   繁体   English

在C ++中快速解析制表符分隔的字符串和整数

[英]Quickly parse tab-separated strings and ints in c++

I have a file which is a couple gigabytes large, and has millions of lines. 我有一个文件,大小为几GB,并且有数百万行。 Each line has data separated like so: 每行都有分离的数据,如下所示:

string TAB int TAB int TAB int NEWLINE

My previous attempts to read this line by line have bottle necked as a result of the CPU instead of my SSD's write speed. 我之前尝试逐行读取此文件的原因是CPU瓶颈,而不是SSD的写入速度。

How can I quickly parse a massive file line by line? 如何快速逐行解析大量文件?

Note: The files can't be parsed into a vector all at once because they are too large. 注意:由于文件太大,因此无法一次全部解析为向量。

In my original code I was parsing the data into vector of structs like this 在我的原始代码中,我将数据解析为这样的结构向量

struct datastruct {
    std::string name;
    int year;
    int occurences;
    int volcount;
};
std::vector<datastruct> data;

Using your datastruct , you could do 使用您的datastruct ,您可以做

std::ifstream file;
datastruct data;
while (file >> data.name >> data.year >> data.occurences >> data.volcount)
{
    // do what you want with data, its contents will be replaced during next iteration
}

Is that that slow? 那慢吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM