[英]Boost spirit core dump on parsing bracketed expression
Having some simplified grammar that should parse sequence of terminal literals: id, '<', '>' and ":action".有一些应该解析终端文字序列的简化语法:id、'<'、'>' 和 ":action"。 I need to allow brackets '(' ')' that do nothing but improve reading.
我需要允许括号 '(' ')' 除了提高阅读能力之外什么都不做。 (Full example is there http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 ) Snip of my grammar:
(完整的例子是http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 )我的语法片段:
start = expression % eol;
expression = (simple_def >> -expression)
| (qi::lit('(') > expression > ')');
simple_def = qi::lit('<') [qi::_val = Command::left]
| qi::lit('>') [qi::_val = Command::right]
| key [qi::_val = Command::id]
| qi::lit(":action") [qi::_val = Command::action]
;
key = +qi::char_("a-zA-Z_0-9");
When I try to parse: const std::string s = "(a1 >:action)";
当我尝试解析时:
const std::string s = "(a1 >:action)";
Everything works like a charm.一切都像魅力一样。 But when I little bit bring more complexity with brackets
"(a1 (>):action)"
I've gotten coredump.但是当我用括号
"(a1 (>):action)"
带来更多复杂性时,我得到了coredump。 Just for information - coredump happens on coliru , while msvc compiled example just demonstrate fail parsing.仅供参考 - coredump 发生在coliru上,而 msvc 编译的示例仅演示失败解析。
So my questions: (1) what's wrong with brackets, (2) how exactly brackets can be introduced to expression.所以我的问题是:(1)括号有什么问题,(2)括号如何准确地引入表达式。
ps It is simplified grammar, in real I have more complicated case, but this is a minimal reproduceable code. ps 这是简化的语法,实际上我有更复杂的情况,但这是一个最小的可重现代码。
You should just handle the expectation failure:您应该只处理期望失败:
terminate called after throwing an instance of 'boost::wrapexcept<boost::spir
it::qi::expectation_failure<__gnu_cxx::__normal_iterator<char const*, std::__
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >
>'
what(): boost::spirit::qi::expectation_failure
Aborted (core dumped)
If you handle the expectation failure, the program will not have to terminate.如果您处理期望失败,程序将不必终止。
Your 'nested expression' rule only accepts a single expression.您的“嵌套表达式”规则只接受一个表达式。 I think that
我认为
expression = (simple_def >> -expression)
is intended to match "1 or more `simple_def".旨在匹配“1 个或多个 `simple_def”。 However, the alternative branch:
但是,替代分支:
| ('(' > expression > ')');
doesn't accept the same: it just stops after parsing `)'.不接受相同的:它只是在解析 `)' 后停止。 This means that your input is simply invalid according to the grammar.
这意味着根据语法,您的输入根本无效。
I suggest a simplification by expressing intent.我建议通过表达意图来简化。 You were on the right path with semantic typedefs.
您在语义 typedef 的正确道路上。 Let's avoid the "weasely" Line Of Lines (what even is that?):
让我们避免“狡猾”的线条(那是什么?):
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
And use these typedefs consistently.并始终如一地使用这些 typedef。 Now, we can express the grammar as we "think" about it:
现在,我们可以在“思考”时表达语法:
start = skip(blank)[script];
script = line % eol;
line = +simple;
simple = group | command;
group = '(' > line > ')';
See, by simplifying our mental model and sticking to it, we avoided the entire problem you had a hard time spotting.看,通过简化我们的思维 model 并坚持下去,我们避免了你很难发现的整个问题。
Here's a quick demo that includes error handling, optional debug output, both test cases and encapsulating the skipper as it is part of the grammar: Live On Compiler Explorer这是一个快速演示,包括错误处理、可选调试 output、测试用例和封装作为语法一部分的跳过程序: Live On Compiler Explorer
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
enum class Command { id, left, right, action };
static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
switch (cmd) {
case Command::id: return os << "[ID]";
case Command::left: return os << "[LEFT]";
case Command::right: return os << "[RIGHT]";
case Command::action: return os << "[ACTION]";
}
return os << "[???]";
}
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
ExprGrammar() : ExprGrammar::base_type(start) {
using namespace qi;
start = skip(blank)[script];
script = line % eol;
line = +simple;
simple = group | command;
group = '(' > line > ')';
command =
lit('<') [ _val = Command::left ] |
lit('>') [ _val = Command::right ] |
key [ _val = Command::id ] |
lit(":action") [ _val = Command::action ] ;
key = +char_("a-zA-Z_0-9");
BOOST_SPIRIT_DEBUG_NODES((command)(line)(simple)(group)(script)(key));
}
private:
qi::rule<It, Script()> start;
qi::rule<It, Line(), qi::blank_type> line, simple, group;
qi::rule<It, Script(), qi::blank_type> script;
qi::rule<It, Command(), qi::blank_type> command;
// lexemes
qi::rule<It, Id()> key;
};
int main() {
using It = std::string::const_iterator;
ExprGrammar<It> const p;
for (const std::string s : {
"a1 > :action\na1 (>) :action",
"(a1 > :action)\n(a1 (>) :action)",
"a1 (> :action)",
}) {
It f(begin(s)), l(end(s));
try {
Script parsed;
bool ok = qi::parse(f, l, p, parsed);
if (ok) {
fmt::print("Parsed {}\n", parsed);
} else {
fmt::print("Parsed failed\n");
}
if (f != l) {
fmt::print("Remaining unparsed: '{}'\n", std::string(f, l));
}
} catch (qi::expectation_failure<It> const& ef) {
fmt::print("{}\n", ef.what()); // TODO add more details :)
}
}
}
Prints印刷
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}}
However, I think this can all be greatly simplified using qi::symbols
for the commands.但是,我认为使用
qi::symbols
命令可以大大简化这一切。 In fact it looks like you're only tokenizing (you confirm this when you say that the parentheses are not important).实际上,看起来您只是在进行标记(当您说括号不重要时,您确认了这一点)。
line = +simple;
simple = group | command | (omit[key] >> attr(Command::id));
group = '(' > line > ')';
key = +char_("a-zA-Z_0-9");
Now you don't need Phoenix at all: Live On Compiler Explorer , printing现在您根本不需要 Phoenix: Live On Compiler Explorer ,打印
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}}
Since I observe that you're basically tokenizing line-wise, why not simply skip the parentheses, and simplify all the way down to:既然我观察到您基本上是按行进行标记,为什么不简单地跳过括号,并一直简化为:
script = line % eol;
line = *(command | omit[key] >> attr(Command::id));
That's all .就是这样。 See it Live On Compiler Explorer again:
再次在编译器资源管理器上查看它:
#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
namespace qi = boost::spirit::qi;
enum class Command { id, left, right, action };
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
return os << (std::array{"ID", "LEFT", "RIGHT", "ACTION"}.at(int(cmd)));
}
template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
ExprGrammar() : ExprGrammar::base_type(start) {
using namespace qi;
start = skip(skipper.alias())[line % eol];
line = *(command | omit[key] >> attr(Command::id));
key = +char_("a-zA-Z_0-9");
BOOST_SPIRIT_DEBUG_NODES((line)(key));
}
private:
using Skipper = qi::rule<It>;
qi::rule<It, Script()> start;
qi::rule<It, Line(), Skipper> line;
Skipper skipper = qi::char_(" \t\b\f()");
qi::rule<It /*, Id()*/> key; // omit attribute for efficiency
struct cmdsym : qi::symbols<char, Command> {
cmdsym() { this->add("<", Command::left)
(">", Command::right)
(":action", Command::action);
}
} command;
};
int main() {
using It = std::string::const_iterator;
ExprGrammar<It> const p;
for (const std::string s : {
"a1 > :action\na1 (>) :action",
"(a1 > :action)\n(a1 (>) :action)",
"a1 (> :action)",
})
try {
It f(begin(s)), l(end(s));
Script parsed;
bool ok = qi::parse(f, l, p, parsed);
fmt::print("ok? {} {}\n", ok, parsed);
if (f != l)
fmt::print(" -- Remaining '{}'\n", std::string(f, l));
} catch (qi::expectation_failure<It> const& ef) {
fmt::print("{}\n", ef.what()); // TODO add more details :)
}
}
Prints印刷
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}}
Note I very subtly changed +() to *() so it would accept empty lines as well.
请注意,我非常巧妙地将 +() 更改为 *(),因此它也可以接受空行。 This may or may not be what you want
这可能是也可能不是您想要的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.