简体   繁体   English

C ++读取整数并从文件浮动

[英]c++ reading ints and float from file

I have a project for school where I have a *.txt file with ~2M lines (~42MB) and each line contains row number, column number and value. 我有一个学校项目,其中有一个* .txt文件,其中包含〜2M行(〜42MB),每行包含行号,列号和值。 I am parsing these into three vectors (int, int, float) but it takes around 45sec to complete. 我将它们解析为三个向量(int,int,float),但大约需要45秒才能完成。 And I am looking for some way to make it faster. 我正在寻找使它更快的方法。 I guess the bottleneck is the iteration through every element and it would be better to load one chunk of rows/columns/values and put them into a vector at once. 我想瓶颈是每个元素的迭代,最好加载一行大块的行/列/值并将它们一次放入向量中。 Unfortunately, I do not know how to do that, or if its even possible. 不幸的是,我不知道该怎么做,甚至可能。 Also I would like to stick to STL. 我也想坚持使用STL。 Is there a way I could make it faster? 有什么办法可以使我更快吗?

Thanks! 谢谢!

file example (first line has the count of rows, columns and non-zero values): 文件示例(第一行包含行,列和非零值的计数):

1092689 2331 2049148
1 654 0.272145
1 705 0.019104
2 245 0.812118
2 659 0.598012
2 1043 0.852509
2 1147 0.213949

For now I am working with: 目前,我正在与:

void LoadFile(const char *NameOfFile, vector<int> &row, 
    vector<int> &col, vector<float> &value) {
    unsigned int columns, rows, countOfValues;
    int rN, cN;
    float val;
    ifstream testData(NameOfFile);
    testData >> rows >> columns >> countOfValues;
    row.reserve(countOfValues);
    col.reserve(countOfValues);
    value.reserve(countOfValues);

    while (testData >> rN >> cN >> val) {
        row.push_back(rN);
        col.push_back(cN);
        value.push_back(val);
    }
testData.close();
}

Before you look for a solution to the problem, I would suggest to take some steps to figure out whether the bottleneck is reading the data from the file or filling up the vectors. 在寻找解决方案之前,我建议您采取一些步骤来确定瓶颈是从文件中读取数据还是填充向量。 To that end, I would time the following operations: 为此,我将对以下操作进行计时:

  1. Read the data from the file and discard the data. 从文件中读取数据并丢弃数据。
  2. Use a random number generator to generate random numbers and fill up the vectors. 使用随机数生成器生成随机数并填充向量。

If the bottleneck is (1), find ways to speed up reading the data from the file. 如果瓶颈是(1),请找到加快从文件读取数据的方法。
If the bottleneck is (2), find ways to speed up filling up the vector. 如果瓶颈是(2),请找到加快填充向量的方法。

Improving bottleneck of reading 改善阅读瓶颈

Using std::istream::read to read the entire contents of the file in call and then using a std::istringstream to extract the data should lead to some improvement. 使用std::istream::read读取调用中文件的全部内容,然后使用std::istringstream提取数据可以带来一些改进。

Improving bottleneck of filling up vectors 改善填充向量的瓶颈

Before adding data to the vector s, reserve a large capacity, which will reduce the number of times they are resized. 在将数据添加到 vector s之前,请保留大容量,这将减少调整大小的次数。

If you know there are 1M lines of text, reserve 1M elements in the vectors. 如果您知道有100万行文本,请在向量中保留1M个元素。 If the real number of items in the vectors is a bit less or bit more, it shouldn't matter too much from a performance stand point. 如果向量中的实际项目数少一点或多一点,那么从性能的角度来看并没有太大关系。

PS The OP is already doing that. PS OP已经在这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM