简体   繁体   中英

c++ reading ints and float from file

I have a project for school where I have a *.txt file with ~2M lines (~42MB) and each line contains row number, column number and value. I am parsing these into three vectors (int, int, float) but it takes around 45sec to complete. And I am looking for some way to make it faster. I guess the bottleneck is the iteration through every element and it would be better to load one chunk of rows/columns/values and put them into a vector at once. Unfortunately, I do not know how to do that, or if its even possible. Also I would like to stick to STL. Is there a way I could make it faster?

Thanks!

file example (first line has the count of rows, columns and non-zero values):

1092689 2331 2049148
1 654 0.272145
1 705 0.019104
2 245 0.812118
2 659 0.598012
2 1043 0.852509
2 1147 0.213949

For now I am working with:

void LoadFile(const char *NameOfFile, vector<int> &row, 
    vector<int> &col, vector<float> &value) {
    unsigned int columns, rows, countOfValues;
    int rN, cN;
    float val;
    ifstream testData(NameOfFile);
    testData >> rows >> columns >> countOfValues;
    row.reserve(countOfValues);
    col.reserve(countOfValues);
    value.reserve(countOfValues);

    while (testData >> rN >> cN >> val) {
        row.push_back(rN);
        col.push_back(cN);
        value.push_back(val);
    }
testData.close();
}

Before you look for a solution to the problem, I would suggest to take some steps to figure out whether the bottleneck is reading the data from the file or filling up the vectors. To that end, I would time the following operations:

  1. Read the data from the file and discard the data.
  2. Use a random number generator to generate random numbers and fill up the vectors.

If the bottleneck is (1), find ways to speed up reading the data from the file.
If the bottleneck is (2), find ways to speed up filling up the vector.

Improving bottleneck of reading

Using std::istream::read to read the entire contents of the file in call and then using a std::istringstream to extract the data should lead to some improvement.

Improving bottleneck of filling up vectors

Before adding data to the vector s, reserve a large capacity, which will reduce the number of times they are resized.

If you know there are 1M lines of text, reserve 1M elements in the vectors. If the real number of items in the vectors is a bit less or bit more, it shouldn't matter too much from a performance stand point.

PS The OP is already doing that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM