简体   繁体   中英

Reading text file with floating point numbers faster and storing in vector of floats

I have a c++ code written in visual studio 2010, which reads a text file ( which contains tens of thousands of floating point numbers separated by space).Code reads the text file contents and store it to a vector of floating points.My problem is , code is taking alot of time to read and copy to the vector.Is there a faster way to do this.Some thing that can be done in visual studio c++ ( using boost libraries or mmap )

vector<float> ReplayBuffer;
ifstream in;
in.open("fileName.txt");
if(in.is_open())
{
    in.setf(ios::fixed);
    in.precision(3);

   in.seekg(0,ios::end);
   fileSizes = in.tellg();

   in.seekg(0,ios::beg);
   while(!in.eof())
   {
   for(float f;in>>f;)
       ReplayBuffer.push_back(f);
   }
   in.close();
}

If you files are very big, consider memory mapped files : Boost offer an excellent library to manipulate them cross platform (you mentioned mmap which is a Posix-Unix command, and it looks like you are developing on Windows)

Also, consider reserving space in your vector to avoid dynamic reallocations ReplayBuffer.reserve(expected_final_size);

Note:

  • Do not use !in.eof() to check if you finished reading the file, it is a bad practice .
  • If you dont need fileSizes , do not compute it.

If the file fits in your address space, you can mmap it and then use istrstream on the resulting memory. istrstream is formally deprecated, but it's still there, and is the only standard stream that will work here. Or you can write your own memory streambuf, which might even be faster than istrstream , because you won't have to support seeking, etc. on it (although seeking on an istrstream is also a fairly trivial operation, and shouldn't impact on the rest very much).

Beyond that, every layer of abstraction generally costs something, so it will probably be even faster (although not necessarily very much so) if you loop manually, using strtod .

In all cases, converting a generic floating point into machine floating point is an expensive operation. If you know something about the values you will be seeing, and their format (eg no scientific notation, values in a certain range, with a maximum number of places after the decimal), it's possible to write a conversion routine that would be faster than strtod . This requires some care, but if you know that the total number of decimal digits in the number will always result in a value that will fit in an int , you can do a very rapid int conversion, ignoring the '.' , and then scale it by multiplying by the appropriate floating point value (eg '.001' if there were 3 digits after the '.' ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM