解析括号表达式时提升精神核心转储

Question

Having some simplified grammar that should parse sequence of terminal literals: id, '<', '>' and ":action".有一些应该解析终端文字序列的简化语法：id、'<'、'>' 和 ":action"。 I need to allow brackets '(' ')' that do nothing but improve reading.我需要允许括号 '(' ')' 除了提高阅读能力之外什么都不做。 (Full example is there http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 ) Snip of my grammar: （完整的例子是http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 ）我的语法片段：

    start  = expression % eol;
    expression   = (simple_def >> -expression)
    | (qi::lit('(') > expression > ')');

    simple_def = qi::lit('<') [qi::_val = Command::left] 
    | qi::lit('>') [qi::_val = Command::right] 
    | key [qi::_val = Command::id] 
    | qi::lit(":action") [qi::_val = Command::action] 
    ;
    
    key = +qi::char_("a-zA-Z_0-9");

When I try to parse: const std::string s = "(a1 >:action)";当我尝试解析时： const std::string s = "(a1 >:action)"; Everything works like a charm.一切都像魅力一样。 But when I little bit bring more complexity with brackets "(a1 (>):action)" I've gotten coredump.但是当我用括号"(a1 (>):action)"带来更多复杂性时，我得到了coredump。 Just for information - coredump happens on coliru , while msvc compiled example just demonstrate fail parsing.仅供参考 - coredump 发生在coliru上，而 msvc 编译的示例仅演示失败解析。

So my questions: (1) what's wrong with brackets, (2) how exactly brackets can be introduced to expression.所以我的问题是：（1）括号有什么问题，（2）括号如何准确地引入表达式。

ps It is simplified grammar, in real I have more complicated case, but this is a minimal reproduceable code. ps 这是简化的语法，实际上我有更复杂的情况，但这是一个最小的可重现代码。

Answer 1

You should just handle the expectation failure:您应该只处理期望失败：

terminate called after throwing an instance of 'boost::wrapexcept<boost::spir
it::qi::expectation_failure<__gnu_cxx::__normal_iterator<char const*, std::__
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >
>'
  what():  boost::spirit::qi::expectation_failure
Aborted (core dumped)

If you handle the expectation failure, the program will not have to terminate.如果您处理期望失败，程序将不必终止。

Fixing The Grammar修正语法

Your 'nested expression' rule only accepts a single expression.您的“嵌套表达式”规则只接受一个表达式。 I think that我认为

expression = (simple_def >> -expression)

is intended to match "1 or more `simple_def".旨在匹配“1 个或多个 `simple_def”。 However, the alternative branch:但是，替代分支：

     | ('(' > expression > ')');

doesn't accept the same: it just stops after parsing `)'.不接受相同的：它只是在解析 `)' 后停止。 This means that your input is simply invalid according to the grammar.这意味着根据语法，您的输入根本无效。

I suggest a simplification by expressing intent.我建议通过表达意图来简化。 You were on the right path with semantic typedefs.您在语义 typedef 的正确道路上。 Let's avoid the "weasely" Line Of Lines (what even is that?):让我们避免“狡猾”的线条（那是什么？）：

using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

And use these typedefs consistently.并始终如一地使用这些 typedef。 Now, we can express the grammar as we "think" about it:现在，我们可以在“思考”时表达语法：

    start  = skip(blank)[script];
    script = line % eol;

    line   = +simple;
    simple = group | command;
    group  = '(' > line > ')';

See, by simplifying our mental model and sticking to it, we avoided the entire problem you had a hard time spotting.看，通过简化我们的思维 model 并坚持下去，我们避免了你很难发现的整个问题。

Here's a quick demo that includes error handling, optional debug output, both test cases and encapsulating the skipper as it is part of the grammar: Live On Compiler Explorer这是一个快速演示，包括错误处理、可选调试 output、测试用例和封装作为语法一部分的跳过程序： Live On Compiler Explorer

#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi  = boost::spirit::qi;
namespace phx = boost::phoenix;

enum class Command { id, left, right, action };

static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
    switch (cmd) {
        case Command::id: return os << "[ID]";
        case Command::left: return os << "[LEFT]";
        case Command::right: return os << "[RIGHT]";
        case Command::action: return os << "[ACTION]";
    }
    return os << "[???]";
}

using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
    ExprGrammar() : ExprGrammar::base_type(start) {
        using namespace qi;

        start  = skip(blank)[script];
        script = line % eol;

        line   = +simple;
        simple = group | command;
        group  = '(' > line > ')';

        command = 
            lit('<')       [ _val = Command::left   ] |
            lit('>')       [ _val = Command::right  ] |
            key            [ _val = Command::id     ] |
            lit(":action") [ _val = Command::action ] ;

        key = +char_("a-zA-Z_0-9");

        BOOST_SPIRIT_DEBUG_NODES((command)(line)(simple)(group)(script)(key));
    }

private:
    qi::rule<It, Script()>                 start;
    qi::rule<It, Line(), qi::blank_type>   line, simple, group;
    qi::rule<It, Script(), qi::blank_type> script;

    qi::rule<It, Command(), qi::blank_type> command;

    // lexemes
    qi::rule<It, Id()> key;
};

int main() {
    using It = std::string::const_iterator;
    ExprGrammar<It> const p;

    for (const std::string s : {
            "a1 > :action\na1 (>) :action",
            "(a1 > :action)\n(a1 (>) :action)",
            "a1 (> :action)",
        }) {

        It f(begin(s)), l(end(s));

        try {
            Script parsed;
            bool ok = qi::parse(f, l, p, parsed);

            if (ok) {
                fmt::print("Parsed {}\n", parsed);
            } else {
                fmt::print("Parsed failed\n");
            }

            if (f != l) {
                fmt::print("Remaining unparsed: '{}'\n", std::string(f, l));
            }
        } catch (qi::expectation_failure<It> const& ef) {
            fmt::print("{}\n", ef.what()); // TODO add more details :)
        }
    }
}

Prints印刷

Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}}

BONUS奖金

However, I think this can all be greatly simplified using qi::symbols for the commands.但是，我认为使用qi::symbols命令可以大大简化这一切。 In fact it looks like you're only tokenizing (you confirm this when you say that the parentheses are not important).实际上，看起来您只是在进行标记（当您说括号不重要时，您确认了这一点）。

    line   = +simple;
    simple = group | command | (omit[key] >> attr(Command::id));
    group  = '(' > line > ')';
    key    = +char_("a-zA-Z_0-9");

Now you don't need Phoenix at all: Live On Compiler Explorer , printing现在您根本不需要 Phoenix： Live On Compiler Explorer ，打印

ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}}

Even Simpler?更简单？

Since I observe that you're basically tokenizing line-wise, why not simply skip the parentheses, and simplify all the way down to:既然我观察到您基本上是按行进行标记，为什么不简单地跳过括号，并一直简化为：

    script = line % eol;
    line   = *(command | omit[key] >> attr(Command::id));

That's all .就是这样。 See it Live On Compiler Explorer again:再次在编译器资源管理器上查看它：

#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
namespace qi = boost::spirit::qi;

enum class Command { id, left, right, action };
using Id     = std::string;
using Line   = std::vector<Command>;
using Script = std::vector<Line>;

static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
    return os << (std::array{"ID", "LEFT", "RIGHT", "ACTION"}.at(int(cmd)));
}

template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
    ExprGrammar() : ExprGrammar::base_type(start) {
        using namespace qi;
        start = skip(skipper.alias())[line % eol];
        line  = *(command | omit[key] >> attr(Command::id));
        key   = +char_("a-zA-Z_0-9");

        BOOST_SPIRIT_DEBUG_NODES((line)(key));
    }
private:
    using Skipper = qi::rule<It>;
    qi::rule<It, Script()>        start;
    qi::rule<It, Line(), Skipper> line;

    Skipper                 skipper = qi::char_(" \t\b\f()");
    qi::rule<It /*, Id()*/> key; // omit attribute for efficiency
    struct cmdsym : qi::symbols<char, Command> {
        cmdsym() { this->add("<", Command::left)
            (">", Command::right)
            (":action", Command::action);
        }
    } command;
};

int main() {
    using It = std::string::const_iterator;
    ExprGrammar<It> const p;

    for (const std::string s : {
            "a1 > :action\na1 (>) :action",
            "(a1 > :action)\n(a1 (>) :action)",
            "a1 (> :action)",
        })
    try {
        It f(begin(s)), l(end(s));

        Script parsed;
        bool ok = qi::parse(f, l, p, parsed);

        fmt::print("ok? {} {}\n", ok, parsed);
        if (f != l)
            fmt::print(" -- Remaining '{}'\n", std::string(f, l));
    } catch (qi::expectation_failure<It> const& ef) {
        fmt::print("{}\n", ef.what()); // TODO add more details :)
    }
}

Prints印刷

ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}}

Note I very subtly changed +() to *() so it would accept empty lines as well.请注意，我非常巧妙地将 +() 更改为 *()，因此它也可以接受空行。 This may or may not be what you want这可能是也可能不是您想要的

解析括号表达式时提升精神核心转储

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-12 11:50:21

Fixing The Grammar修正语法

BONUS奖金

Even Simpler?更简单？

解析括号表达式时提升精神核心转储

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-12 11:50:21

Fixing The Grammar修正语法

BONUS奖金

Even Simpler?更简单？

解决方案1
1 已采纳 2021-05-12 11:50:21