I am looking for a quick way to store strings from a file into a vector of strings such that I can reserve the number of lines ahead of time. What is the best way to do this? Should I cont the new line characters first or just get a total size of the file and just reserve say the size / 80 in order to give a rough estimate on what to reserve. Ideally I don't want to have the vector have to realloc each time which would slow things down tremendously for a large file. Ideally I would count the number of items ahead of time but should I do this by opening in binary mode counting the new lines and then reopening? That seems wasteful, curious on some thoughts for this. Also is there a way to use emplace_back to get rid of the temporary somestring in the getline code below. I did see the following 2 implmentations for counting the number of lines ahead of time Fastest way to find the number of lines in a text (C++)
std::vector<std::string> vs;
std::string somestring;
std::ifstream somefile("somefilename");
while (std::getline(somefile, somestring))
vs.push_back(somestring);
Also I could do something to get the total size ahead of time, can I just transform the char* in this case into the vector directly? This goes back to my reserve hint of saying size / 80 or some constant to give an estimated size to the reserve upfront.
#include <iostream>
#include <fstream>
int main () {
char* contents;
std::ifstream istr ("test.txt");
if (istr)
{
std::streambuf * pbuf = istr.rdbuf();
//which I can use as a reserve hint say size / 80
std::streamsize size = pbuf->pubseekoff(0,istr.end);
//maybe I can construct the vector from the char buf directly?
pbuf->pubseekoff(0,istr.beg);
contents = new char [size];
pbuf->sgetn (contents,size);
}
return 0;
}
我不会浪费时间提前计算行数,而是reserve()
一个初始值,然后开始推送实际行,如果你碰巧推送了保留的项目数,那么只需reserve()
一些空格,然后继续更多推动,根据需要重复。
The strategy for reserving space in a std::vector
is designed to "grow on demand". That is, you will not allocate one string at a time, you will first allocate one, then, say, ten, then, one hundred and so on (not exactly those numbers, but that's the idea). In other word, the implementation of std::vector::push_back already manages this for you.
Consider the following example: I am reading the entire text of War and Peace (65007 lines) using two versions: one which allocates and one which does not (ie, one reserves zero space, and the other reserves the full 65007 lines; text from: http://www.gutenberg.org/cache/epub/2600/pg2600.txt )
#include<iostream>
#include<fstream>
#include<vector>
#include<string>
#include<boost/timer/timer.hpp>
void reader(size_t N=0) {
std::string line;
std::vector<std::string> lines;
lines.reserve(N);
std::ifstream fp("wp.txt");
while(std::getline(fp, line)) {
lines.push_back(line);
}
std::cout<<"read "<<lines.size()<<" lines"<<std::endl;
}
int main() {
{
std::cout<<"no pre-allocation"<<std::endl;
boost::timer::auto_cpu_timer t;
reader();
}
{
std::cout<<"full pre-allocation"<<std::endl;
boost::timer::auto_cpu_timer t;
reader(65007);
}
return 0;
}
Results:
no pre-allocation
read 65007 lines
0.027796s wall, 0.020000s user + 0.000000s system = 0.020000s CPU (72.0%)
full pre-allocation
read 65007 lines
0.023914s wall, 0.020000s user + 0.010000s system = 0.030000s CPU (125.4%)
You see, for a non-trivial amount of text I have a difference of milliseconds.
Do you really need to know the lines beforehand? Is it really a bottleneck? Are you saving, say, one second of Wall time but complicating your code ten-fold by preallocating the lines?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.