I have a couple of ~3MB textfiles that I need to parse in C++.
The text file looks like this ( 1024x786
):
12,23 45,78 90,12 34,56 78,90 ...
12,23 45,78 90,12 34,56 78,90 ...
12,23 45,78 90,12 34,56 78,90 ...
12,23 45,78 90,12 34,56 78,90 ...
12,23 45,78 90,12 34,56 78,90 ...
means "number blocks" separated by Tab
, and the numbers itself containing a ,
(insted of a .
) decimal marker.
First of all I need to read the file. Currently I'm using this:
#include <boost/tokenizer.hpp>
string line;
ifstream myfile(file);
if (myfile.is_open())
{
char_separator<char> sep("\t");
tokenizer<char_separator<char>> tokens(line, sep);
}
myfile.close();
which is working nice in terms of getting me the "number block" but I still need to convert this char
to an float but handling the ,
as a decimal marker. Due to the filesize I think its not a good idea to tokenize
this as well. Further I need to add all this values to an data structure that I can access afterwards by location (eg [x][y]
). Any ideas how to fulfil this?
You can use Boost.Spirit to parse the content of the file and as a final result you may get from the parser the data structured as you like, for example, a std::vector<std::vector<float>>
. IMO, your common file's size is not big. I believe it's better to read the whole file to the memory and execute the parser. An efficient solution to read files is showed below at read_file
.
The qi::float_
parses a real number with a length and size limited by the float
type and it uses a .
(dot) as a separator. You can customize the separator through the qi::real_policies<T>::parse_dot
. Below I am using a code snippet from spirit/example/qi/german_floating_point.cpp
.
Take a look at this demo:
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
std::string read_file(std::string path)
{
std::string str;
std::ifstream file( path, std::ios::ate);
if (!file) return str;
auto size(file.tellg());
str.resize(size);
file.seekg(0, std::ios::beg);
file.rdbuf()->sgetn(&str[0], size);
return str;
}
using namespace boost::spirit;
//From Boost.Spirit example `qi/german_floating_point.cpp`
//Begin
template <typename T>
struct german_real_policies : qi::real_policies<T>
{
template <typename Iterator>
static bool parse_dot(Iterator& first, Iterator const& last)
{
if (first == last || *first != ',')
return false;
++first;
return true;
}
};
qi::real_parser<float, german_real_policies<float> > const german_float;
//End
int main()
{
std::string in(read_file("input"));
std::vector<std::vector<float>> out;
auto ret = qi::phrase_parse(in.begin(), in.end(),
+(+(german_float - qi::eol) >> qi::eol),
boost::spirit::ascii::blank_type{},
out);
if(ret && in.begin() == in.end())
std::cout << "Success" << std::endl;
}
What I would do straight forward (no need for boost::tokenizer
at all):
std::setlocale(LC_NUMERIC, "de_DE"); // Use ',' as decimal point
std::vector<std::vector<double>> dblmat;
std::string line;
while(std::getline(myfile,line)) {
dblmat.push_back(std::vector<double>());
std::istringstream iss(line);
double val;
while(iss >> val) {
dblmat.back().push_back(val);
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.