简体   繁体   English

C++ - 如何使用 stream 解析文件?

[英]C++ - How to use a stream to parse a file?

I have a file and I need to loop through it assigning an int foo, string type, 64/128 bit long.我有一个文件,我需要循环分配一个 int foo,字符串类型,64/128 位长。 How would I use a stream to parse these lines into the following variables - I want to stick with the stream syntax ( ifs >> foo >> type ) but in this case type would end up being the rest of the line after the 0/52... and at that point I'd just get a char* and use strtoull and such so why use the stream in the first place... I'm hoping for readable code without horrid performance over char strings / strtok / strtoull我将如何使用 stream 将这些行解析为以下变量 - 我想坚持使用 stream 语法( ifs >> foo >> type ),但在这种情况下,类型最终会成为 Z65E8800B5C680088B22A 之后的行52 ......那时我只会得到一个 char* 并使用 strtoull 等等,所以为什么首先使用 stream ......我希望可读的代码在 char 字符串 / strtok / strtoull 上没有可怕的性能

//input file:
0ULL'04001C0180000000000000000EE317BC'
52L'04001C0180000000'
//ouput:
//0 ULL 0x04001C0180000000 0x000000000EE317BC
//52 L 0x04001C0180000000

  ifstream ifs("input.data");
  int foo;
  string type;
  unsigned long long ull[2];

Boost Spirit implementation Boost Spirit 实施

Here is the mandatory Boost Spirit (Qi) based implementation.这是强制性的基于 Boost Spirit (Qi) 的实现。 For good measure, including formatting using Boost Spirit (Karma):为了更好地衡量,包括使用 Boost Spirit (Karma) 进行格式化:

#include <string>
#include <iostream>
#include <fstream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

namespace karma=boost::spirit::karma;
namespace qi   =boost::spirit::qi;

static qi::uint_parser<unsigned long long, 16, 16, 16> hex16_p; // parse long hex
static karma::uint_generator<unsigned long long, 16>   hex16_f; // format long hex

int main(int argc, char** args)
{
    std::ifstream ifs("input.data");
    std::string line;
    while (std::getline(ifs, line))
    {
        std::string::iterator begin = line.begin(), end = line.end();

        int                             f0;
        std::string                     f1;
        std::vector<unsigned long long> f2;

        bool ok = parse(begin, end,
                qi::int_                    // an integer
                >> *qi::alpha               // alternatively: *(qi::char_ - '\'')
                >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
            , f0, f1, f2);

        if (ok)
            std::cout << "Parsed: " << karma::format(
                 karma::int_ 
                 << ' ' << karma::string 
                 << ' ' << ("0x" << hex16_f) % ' '
             , f0, f1, f2) << std::endl;
        else
            std::cerr << "Parse failed: " << line << std::endl;
    }

    return 0;
}

Test run:测试运行:

Parsed: 0 ULL 0x4001c0180000000 0xee317bc
Parsed: 52 L 0x4001c0180000000

see Tweaks and samples below for info on how to tweak eg hex output有关如何调整的信息,请参阅下面的调整和示例,例如十六进制 output

Benchmark基准

I had benchmarked @Cubbi's version and the above as written on 100,000x the sample inputs you provided.我已经对@Cubbi 的版本和上面的版本进行了基准测试,上面的是你提供的样本输入的 100,000 倍。 This initially gave Cubbi's version a slight advantage: 0.786s versus 0.823s .这最初给 Cubbi 的版本带来了一点优势: 0.786s0.823s

Now, that of course wasn't fair comparison, since my code is constructing the parser on the fly each time.现在,这当然不是公平的比较,因为我的代码每次都在动态构建解析器。 With that taken out of the loop like so:像这样从循环中取出:

typedef std::string::iterator It;

const static qi::rule<It> parser = qi::int_ >> *qi::alpha >> '\'' >> +hex16_p >> '\'';
bool ok = parse(begin, end, parser, f0, f1, f2);

Boost Spirit comes out a clear winner with only 0.093s ; Boost Spirit 仅用0.093s就成为明显的赢家; already a factor 8.5x faster, and that is even with the karma formatter still being constructed each iteration.已经快了 8.5 倍,即使每次迭代仍在构建 karma 格式化程序。

with the output formatting commented out in both versions, Boost Spirit is >11x faster output 格式在两个版本中都被注释掉了,Boost Spirit 的速度快了 11 倍以上

Tweaks, samples调整,样本

Note how you can easily tweak things:请注意如何轻松调整内容:

//  >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
    >> '\'' >> qi::repeat(1,2)[ hex16_p ] >> '\'' // accept 16 or 32 digits

Or format the hex output just like the input:或者像输入一样格式化十六进制 output:

// ("0x" << hex16_f) % ' '
karma::right_align(16, '0')[ karma::upper [ hex16_f ] ] % ""

Changed sample output:更改样本 output:

0ULL'04001C0180000000000000000EE317BC'
Parsed: 0 ULL 04001C0180000000000000000EE317BC
52L'04001C0180000000'
Parsed: 52 L 04001C0180000000

HTH HTH

This is a rather trivial task for a more sophisticated parser such as boost.spirit .对于更复杂的解析器(例如boost.spirit )来说,这是一项相当微不足道的任务。

To solve this using just the standard C++ streams you would need to使用标准 C++ 流来解决此问题,您需要

  • a) treat ' as whitespace and a) 将'视为空格和
  • b) take an extra pass over the string "04001C0180000000000000000EE317BC" which has no separators between the values. b)对字符串“04001C0180000000000000000EE317BC”进行额外的传递,该字符串在值之间没有分隔符。

Borrowing Jerry Coffin's sample facet code ,借用 Jerry Coffin 的样面代码

#include <iostream>
#include <fstream>
#include <locale>
#include <vector>
#include <sstream>
#include <iomanip>
struct tick_is_space : std::ctype<char> {
    tick_is_space() : std::ctype<char>(get_table()) {}
    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask>
               rc(table_size, std::ctype_base::mask());
        rc['\n'] = std::ctype_base::space;
        rc['\''] = std::ctype_base::space;
        return &rc[0];
    }
};

int main()
{
    std::ifstream ifs("input.data");
    ifs.imbue(std::locale(std::locale(), new tick_is_space()));
    int foo;
    std::string type, ullstr;
    while( ifs >> foo >> type >> ullstr)
    {
        std::vector<unsigned long long> ull;
        while(ullstr.size() >= 16) // sizeof(unsigned long long)*2
        {
            std::istringstream is(ullstr.substr(0, 16));
            unsigned long long tmp;
            is >> std::hex >> tmp;
            ull.push_back(tmp);
            ullstr.erase(0, 16);
        }
        std::cout << std::dec << foo << " " << type << " "
                  << std::hex << std::showbase;
        for(size_t p=0; p<ull.size(); ++p)
            std::cout << std::setw(16) << std::setfill('0') << ull[p] << ' ';
        std::cout << '\n';
    }
}

test: https://ideone.com/lRBTq测试: https://ideone.com/lRBTq

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM