简体   繁体   English

分段错误与递归Spirit.Qi语法

[英]Segmentation fault with recursive Spirit.Qi grammar

I'm trying to create a very simple parser for a very simplistic language that only contains numbers and mathematical expressions. 我正在尝试为一种非常简单的语言创建一个非常简单的解析器,该语言只包含数字和数学表达式。 Ultimately I plan to expand this but not until I can get these basic versions working. 最终我计划扩展这个,但直到我可以使这些基本版本工作。

I've successfully parsed: 我已成功解析:

1
425
1 + 1
1 - 1
1 * 1
1 / 1

No problem. 没问题。 But I wanted to make it recursive, let's say, to parse input like: 但我想让它递归,比方说,解析输入,如:

1 + 2 - 3

I began to get segmentation faults. 我开始出现分段错误。 I've done some googling around for recursive grammars and segmentation faults and I can't seem to apply anything I've found to this grammar to make it work. 我已经做了一些谷歌搜索递归语法和分段错误,我似乎无法应用任何我发现的语法来使它工作。 This is either due to them not fitting my situation or my failure to correctly understand what is happening with my qi grammar. 这要么是因为它们不符合我的情况,要么是因为我无法正确理解我的qi语法会发生什么。

My grammar consists of the following structs (including fusion adaptations): 我的语法由以下结构组成(包括融合适应):

namespace fun_lang {
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    namespace phoenix = boost::phoenix;
    namespace fusion = boost::fusion;

    struct number_node {
        long value;
    };

    struct operation_node;

    typedef boost::variant<
        boost::recursive_wrapper<operation_node>,
        number_node
    > node;

    struct operation_node {
        node left, right;
        char op;
    };

    struct program {
        std::vector<node> nodes;
    };
}

BOOST_FUSION_ADAPT_STRUCT(fun_lang::program, (std::vector<fun_lang::node>, nodes));
BOOST_FUSION_ADAPT_STRUCT(fun_lang::number_node, (long, value));
BOOST_FUSION_ADAPT_STRUCT(fun_lang::operation_node, (fun_lang::node, left) (char, op) (fun_lang::node, right));

namespace fun_lang {
    template <typename Iterator, typename Skipper>
    struct fun_grammar : qi::grammar<Iterator, program(), Skipper> {
        fun_grammar() : fun_grammar::base_type(start) {
            using ascii::char_;
            using qi::ulong_;
            using qi::_val;
            using qi::_1;

            using phoenix::push_back;
            using phoenix::at_c;

            expression = (integer | operation)[_val = _1];

            oper = (char_('+') | char_('-') | char_('*') | char_('/'))[_val = _1];
            integer = ulong_[at_c<0>(_val) = _1];

            operation = expression[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];

            start = *expression[push_back(at_c<0>(_val), _1)];
        }

        qi::rule<Iterator, program(), Skipper> start;
        qi::rule<Iterator, number_node(), Skipper> integer;
        qi::rule<Iterator, char(), Skipper> oper;
        qi::rule<Iterator, node(), Skipper> expression;
        qi::rule<Iterator, operation_node(), Skipper> operation;
    };
}

Some of the rule structures are based off a yacc grammar I wrote for another language which I was using as a reference for a way to structure these rules. 一些规则结构基于我为另一种语言编写的yacc语法,我将其用作构造这些规则的方法的参考。 I'm not sure what is causing the segmentation fault but I know when running this that is what I receive. 我不确定导致分段错误的是什么,但我知道在运行时我收到的是什么。 I've tried simplifying rules, removing some intermediate rules, and testing non-recursive methods. 我已经尝试过简化规则,删除一些中间规则,并测试非递归方法。 Anything that is not recursive seems to work but I've seen many examples of Spirit with recursive rules that were successful so I feel like I'm just not quite understanding how to express those. 任何不递归的东西似乎都有效,但我看到许多精神的例子都是成功的递归规则所以我觉得我只是不太明白如何表达那些。

EDIT 编辑

For aid in solving the problem you can find a mostly exact copy on ideone . 为了帮助解决问题,您可以在ideone上找到最精确的副本。 The only difference between the ideone version and what I have locally is instead of reading a file it pulls directly from standard input. ideone版本与我本地版本之间的唯一区别在于它不是直接从标准输入读取文件。

There are two sources of stack overflows (which end in segmentation faults). 堆栈溢出有两个来源(以分段错误结束)。 One is the constructor of operation_node and node . 一个是operation_nodenode的构造函数。 boost::variant , when default-constructed, is initialized with a default-constructed object of its first template argument. 当默认构造时, boost::variant使用其第一个模板参数的默认构造对象进行初始化。 This is boost::recursive_wrapper<operation_node> , which constructs an operation_node , which constructs two node s, which construct a boost::recursive_wrapper<operation_node> , and this goes on until the stack is exhausted. 这是boost::recursive_wrapper<operation_node> ,它构造一个operation_node ,它构造了两个node s,构造了一个boost::recursive_wrapper<operation_node> ,这一直持续到堆栈耗尽为止。

It is common to give variants in spirit grammars a nil type like struct nil { }; 通常在精神语法中给出变体nil类型,如struct nil { }; as first argument to prevent this and have a way to identify uninitialised variants, so 作为防止这种情况的第一个参数,并且有办法识别未初始化的变体,所以

struct nil { };

typedef boost::variant<
    nil,
    boost::recursive_wrapper<operation_node>,
    number_node
> node;

will fix this. 会解决这个问题。 If you don't want to use a nil type, 如果你不想使用nil类型,

typedef boost::variant<
    number_node,
    boost::recursive_wrapper<operation_node>
> node;

will also work in your case because number_node can be constructed without issue. 也可以在你的情况下工作,因为number_node可以number_node地构建。

The other stack overflow is because Boost.Spirit generates LL(inf) parsers (as opposed to yacc, which generates LALR(1) parsers), which means that what you get is a recursive descent parser. 另一个堆栈溢出是因为Boost.Spirit生成LL(inf)解析器(而不是yacc,它生成LALR(1)解析器),这意味着你得到的是递归下降解析器。 The rules 规则

expression = (integer | operation)[_val = _1];
operation = expression[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];

generate a parser that descends from operation into expression and back into operation without consuming any input. 生成一个解析器,该解析器从operation下降到expression并返回operation而不消耗任何输入。 This recurses until the stack overflows, and that is where you get your other segfault. 这会递归,直到堆栈溢出,这就是你得到其他段错误的地方。

If you reformulate the rule operation as 如果将规则operation重新表示为

operation = integer[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];

this problem goes away. 这个问题消失了。 Furthermore, you'll have to rewrite the expression rule as 此外,您必须将expression规则重写为

expression = (operation | integer)[_val = _1];

for the match to work as I think you expect, otherwise the integer part will successfully match before operation has a chance to be found, and the parser will not backtrack because it has a successful partial match. 为了匹配工作,我认为你期望,否则integer部分将成功匹配operation有机会找到,并且解析器将不会回溯,因为它有一个成功的部分匹配。

Also note that Spirit parsers are attributed; 还要注意Spirit精神解析器的归属; the parser actions you use are largely unnecessary. 您使用的解析器操作基本上是不必要的。 It is possible to rewrite the bulk of your grammar like this: 可以像这样重写大部分语法:

expression = operation | integer;

oper = char_("-+*/");
integer = ulong_;

operation = integer >> oper >> expression;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM