简体   繁体   中英

C++ Storing vector<std::string> from a file using reserve and possibly emplace_back

I am looking for a quick way to store strings from a file into a vector of strings such that I can reserve the number of lines ahead of time. What is the best way to do this? Should I cont the new line characters first or just get a total size of the file and just reserve say the size / 80 in order to give a rough estimate on what to reserve. Ideally I don't want to have the vector have to realloc each time which would slow things down tremendously for a large file. Ideally I would count the number of items ahead of time but should I do this by opening in binary mode counting the new lines and then reopening? That seems wasteful, curious on some thoughts for this. Also is there a way to use emplace_back to get rid of the temporary somestring in the getline code below. I did see the following 2 implmentations for counting the number of lines ahead of time Fastest way to find the number of lines in a text (C++)

std::vector<std::string> vs;
std::string somestring;
std::ifstream somefile("somefilename");
while (std::getline(somefile, somestring))
vs.push_back(somestring);

Also I could do something to get the total size ahead of time, can I just transform the char* in this case into the vector directly? This goes back to my reserve hint of saying size / 80 or some constant to give an estimated size to the reserve upfront.

        #include <iostream>   
        #include <fstream>     

        int main () {
          char* contents;
          std::ifstream istr ("test.txt");

          if (istr) 
          {
            std::streambuf * pbuf = istr.rdbuf();

            //which I can use as a reserve hint say size / 80  
            std::streamsize size = pbuf->pubseekoff(0,istr.end);  

            //maybe I can construct the vector from the char buf directly?
            pbuf->pubseekoff(0,istr.beg);       
            contents = new char [size];
            pbuf->sgetn (contents,size);
          }
          return 0;
    }

我不会浪费时间提前计算行数,而是reserve()一个初始值,然后开始推送实际行,如果你碰巧推送了保留的项目数,那么只需reserve()一些空格,然后继续更多推动,根据需要重复。

The strategy for reserving space in a std::vector is designed to "grow on demand". That is, you will not allocate one string at a time, you will first allocate one, then, say, ten, then, one hundred and so on (not exactly those numbers, but that's the idea). In other word, the implementation of std::vector::push_back already manages this for you.

Consider the following example: I am reading the entire text of War and Peace (65007 lines) using two versions: one which allocates and one which does not (ie, one reserves zero space, and the other reserves the full 65007 lines; text from: http://www.gutenberg.org/cache/epub/2600/pg2600.txt )

#include<iostream>
#include<fstream>
#include<vector>
#include<string>
#include<boost/timer/timer.hpp>

void reader(size_t N=0) {
  std::string line;  
  std::vector<std::string> lines;
  lines.reserve(N);

  std::ifstream fp("wp.txt");  
  while(std::getline(fp, line)) {
    lines.push_back(line);
  }
  std::cout<<"read "<<lines.size()<<" lines"<<std::endl;
}

int main() {
  {
    std::cout<<"no pre-allocation"<<std::endl;
    boost::timer::auto_cpu_timer t;
    reader();
  }
  {
    std::cout<<"full pre-allocation"<<std::endl;    
    boost::timer::auto_cpu_timer t;    
    reader(65007);
  }

  return 0;
}

Results:

no pre-allocation
read 65007 lines
 0.027796s wall, 0.020000s user + 0.000000s system = 0.020000s CPU (72.0%)
full pre-allocation
read 65007 lines
 0.023914s wall, 0.020000s user + 0.010000s system = 0.030000s CPU (125.4%)

You see, for a non-trivial amount of text I have a difference of milliseconds.

Do you really need to know the lines beforehand? Is it really a bottleneck? Are you saving, say, one second of Wall time but complicating your code ten-fold by preallocating the lines?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM