简体   繁体   English

解析括号表达式时提升精神核心转储

[英]Boost spirit core dump on parsing bracketed expression

Having some simplified grammar that should parse sequence of terminal literals: id, '<', '>' and ":action".有一些应该解析终端文字序列的简化语法:id、'<'、'>' 和 ":action"。 I need to allow brackets '(' ')' that do nothing but improve reading.我需要允许括号 '(' ')' 除了提高阅读能力之外什么都不做。 (Full example is there http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 ) Snip of my grammar: (完整的例子是http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 )我的语法片段:

    start  = expression % eol;
    expression   = (simple_def >> -expression)
    | (qi::lit('(') > expression > ')');

    simple_def = qi::lit('<') [qi::_val = Command::left] 
    | qi::lit('>') [qi::_val = Command::right] 
    | key [qi::_val = Command::id] 
    | qi::lit(":action") [qi::_val = Command::action] 
    ;
    
    key = +qi::char_("a-zA-Z_0-9");

When I try to parse: const std::string s = "(a1 >:action)";当我尝试解析时: const std::string s = "(a1 >:action)"; Everything works like a charm.一切都像魅力一样。 But when I little bit bring more complexity with brackets "(a1 (>):action)" I've gotten coredump.但是当我用括号"(a1 (>):action)"带来更多复杂性时,我得到了coredump。 Just for information - coredump happens on coliru , while msvc compiled example just demonstrate fail parsing.仅供参考 - coredump 发生在coliru上,而 msvc 编译的示例仅演示失败解析。

So my questions: (1) what's wrong with brackets, (2) how exactly brackets can be introduced to expression.所以我的问题是:(1)括号有什么问题,(2)括号如何准确地引入表达式。

ps It is simplified grammar, in real I have more complicated case, but this is a minimal reproduceable code. ps 这是简化的语法,实际上我有更复杂的情况,但这是一个最小的可重现代码。

You should just handle the expectation failure:您应该只处理期望失败:

terminate called after throwing an instance of 'boost::wrapexcept<boost::spir
it::qi::expectation_failure<__gnu_cxx::__normal_iterator<char const*, std::__
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >
>'
  what():  boost::spirit::qi::expectation_failure
Aborted (core dumped)

If you handle the expectation failure, the program will not have to terminate.如果您处理期望失败,程序将不必终止。

Fixing The Grammar修正语法

Your 'nested expression' rule only accepts a single expression.您的“嵌套表达式”规则只接受一个表达式。 I think that认为

expression = (simple_def >> -expression)

is intended to match "1 or more `simple_def".旨在匹配“1 个或多个 `simple_def”。 However, the alternative branch:但是,替代分支:

     | ('(' > expression > ')');

doesn't accept the same: it just stops after parsing `)'.不接受相同的:它只是在解析 `)' 后停止。 This means that your input is simply invalid according to the grammar.这意味着根据语法,您的输入根本无效。

I suggest a simplification by expressing intent.我建议通过表达意图来简化。 You were on the right path with semantic typedefs.您在语义 typedef 的正确道路上。 Let's avoid the "weasely" Line Of Lines (what even is that?):让我们避免“狡猾”的线条(那什么?):

using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

And use these typedefs consistently.并始终如一地使用这些 typedef。 Now, we can express the grammar as we "think" about it:现在,我们可以在“思考”时表达语法:

    start  = skip(blank)[script];
    script = line % eol;

    line   = +simple;
    simple = group | command;
    group  = '(' > line > ')';

See, by simplifying our mental model and sticking to it, we avoided the entire problem you had a hard time spotting.看,通过简化我们的思维 model 并坚持下去,我们避免了你很难发现的整个问题。

Here's a quick demo that includes error handling, optional debug output, both test cases and encapsulating the skipper as it is part of the grammar: Live On Compiler Explorer这是一个快速演示,包括错误处理、可选调试 output、测试用例和封装作为语法一部分的跳过程序: Live On Compiler Explorer

#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi  = boost::spirit::qi;
namespace phx = boost::phoenix;

enum class Command { id, left, right, action };

static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
    switch (cmd) {
        case Command::id: return os << "[ID]";
        case Command::left: return os << "[LEFT]";
        case Command::right: return os << "[RIGHT]";
        case Command::action: return os << "[ACTION]";
    }
    return os << "[???]";
}

using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
    ExprGrammar() : ExprGrammar::base_type(start) {
        using namespace qi;

        start  = skip(blank)[script];
        script = line % eol;

        line   = +simple;
        simple = group | command;
        group  = '(' > line > ')';

        command = 
            lit('<')       [ _val = Command::left   ] |
            lit('>')       [ _val = Command::right  ] |
            key            [ _val = Command::id     ] |
            lit(":action") [ _val = Command::action ] ;

        key = +char_("a-zA-Z_0-9");

        BOOST_SPIRIT_DEBUG_NODES((command)(line)(simple)(group)(script)(key));
    }

private:
    qi::rule<It, Script()>                 start;
    qi::rule<It, Line(), qi::blank_type>   line, simple, group;
    qi::rule<It, Script(), qi::blank_type> script;

    qi::rule<It, Command(), qi::blank_type> command;

    // lexemes
    qi::rule<It, Id()> key;
};

int main() {
    using It = std::string::const_iterator;
    ExprGrammar<It> const p;

    for (const std::string s : {
            "a1 > :action\na1 (>) :action",
            "(a1 > :action)\n(a1 (>) :action)",
            "a1 (> :action)",
        }) {

        It f(begin(s)), l(end(s));

        try {
            Script parsed;
            bool ok = qi::parse(f, l, p, parsed);

            if (ok) {
                fmt::print("Parsed {}\n", parsed);
            } else {
                fmt::print("Parsed failed\n");
            }

            if (f != l) {
                fmt::print("Remaining unparsed: '{}'\n", std::string(f, l));
            }
        } catch (qi::expectation_failure<It> const& ef) {
            fmt::print("{}\n", ef.what()); // TODO add more details :)
        }
    }
}

Prints印刷

Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}}

BONUS奖金

However, I think this can all be greatly simplified using qi::symbols for the commands.但是,我认为使用qi::symbols命令可以大大简化这一切。 In fact it looks like you're only tokenizing (you confirm this when you say that the parentheses are not important).实际上,看起来您只是在进行标记(当您说括号不重要时,您确认了这一点)。

    line   = +simple;
    simple = group | command | (omit[key] >> attr(Command::id));
    group  = '(' > line > ')';
    key    = +char_("a-zA-Z_0-9");

Now you don't need Phoenix at all: Live On Compiler Explorer , printing现在您根本不需要 Phoenix: Live On Compiler Explorer ,打印

ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}}

Even Simpler?更简单?

Since I observe that you're basically tokenizing line-wise, why not simply skip the parentheses, and simplify all the way down to:既然我观察到您基本上是按行进行标记,为什么不简单地跳过括号,并一直简化为:

    script = line % eol;
    line   = *(command | omit[key] >> attr(Command::id));

That's all .就是这样 See it Live On Compiler Explorer again:再次在编译器资源管理器上查看它:

#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
namespace qi = boost::spirit::qi;

enum class Command { id, left, right, action };
using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
    return os << (std::array{"ID", "LEFT", "RIGHT", "ACTION"}.at(int(cmd)));
}

template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
    ExprGrammar() : ExprGrammar::base_type(start) {
        using namespace qi;
        start = skip(skipper.alias())[line % eol];
        line  = *(command | omit[key] >> attr(Command::id));
        key   = +char_("a-zA-Z_0-9");

        BOOST_SPIRIT_DEBUG_NODES((line)(key));
    }
private:
    using Skipper = qi::rule<It>;
    qi::rule<It, Script()>        start;
    qi::rule<It, Line(), Skipper> line;

    Skipper                 skipper = qi::char_(" \t\b\f()");
    qi::rule<It /*, Id()*/> key; // omit attribute for efficiency
    struct cmdsym : qi::symbols<char, Command> {
        cmdsym() { this->add("<", Command::left)
            (">", Command::right)
            (":action", Command::action);
        }
    } command;
};

int main() {
    using It = std::string::const_iterator;
    ExprGrammar<It> const p;

    for (const std::string s : {
            "a1 > :action\na1 (>) :action",
            "(a1 > :action)\n(a1 (>) :action)",
            "a1 (> :action)",
        })
    try {
        It f(begin(s)), l(end(s));

        Script parsed;
        bool ok = qi::parse(f, l, p, parsed);

        fmt::print("ok? {} {}\n", ok, parsed);
        if (f != l)
            fmt::print(" -- Remaining '{}'\n", std::string(f, l));
    } catch (qi::expectation_failure<It> const& ef) {
        fmt::print("{}\n", ef.what()); // TODO add more details :)
    }
}

Prints印刷

ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}}

Note I very subtly changed +() to *() so it would accept empty lines as well.请注意,我非常巧妙地将 +() 更改为 *(),因此它也可以接受空行。 This may or may not be what you want这可能是也可能不是您想要的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM