简体   繁体   English

用Boost Spirit解析语法

[英]Parsing a grammar with Boost Spirit

I am trying to parse a C-function like tree expressions like the following (using the Spirit Parser Framework ): 我试图像下面的树表达式一样解析C函数(使用Spirit Parser Framework ):

F( A() , B( GREAT( SOME , NOT ) ) , C( YES ) )

For this I am trying to use the three rules on the following grammar: 为此,我试图使用以下语法的三个规则:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), space_type> {

    InputGrammar() : InputGrammar::base_type( ) {
       tag = ( qi::char_("a-zA-Z_")  >> *qi::char_("a-zA-Z_0-9") )[ push_back( at_c<0>(qi::_val) , qi::_1 ) ];
       command =  tag [ at_c<0>(qi::_val) = at_c<0>(qi::_1) ] >> "(" >> (*instruction >> ",")
                                        [ push_back( at_c<1>(qi::_val) , qi::_1 ) ]  >> ")";
       instruction = ( command | tag ) [qi::_val = qi::_1];
    }
    qi::rule< Iterator , ExpressionAST() , space_type > tag;
    qi::rule< Iterator , ExpressionAST() , space_type > command;
    qi::rule< Iterator , ExpressionAST() , space_type > instruction;
};

Notice that my tag rule just tries to capture the identifiers used in the expressions (the 'function' names). 请注意,我的标记规则只是尝试捕获表达式中使用的标识符(“函数”名称)。 Also notice that the signature of the tag rule returns a ExpressionAST instead of a std::string , like in most examples. 另请注意,标记规则的签名返回ExpressionAST而不是std::string ,就像在大多数示例中一样。 The reason I want to do it like this is actually pretty simple: I hate using variants and I will avoid them if possible. 我想这样做的原因实际上非常简单:我讨厌使用变体,如果可能的话我会避免它们。 It would be great to keep the cake and eat it too I guess. 我想,保持蛋糕和吃它也会很棒。

A command should start with a tag (the name of the current node, first string field of the AST node) and a variable number of arguments enclosed by parentheses, and each of the arguments can be a tag itself or another command. 命令应该以标记(当前节点的名称,AST节点的第一个字符串字段)和括号括起来的可变数量的参数开始,每个参数可以是标记本身或另一个命令。

However, this example does not work at all. 但是,这个例子根本不起作用。 It compiles and everything, but at run time it fails to parse all my test strings. 它编译和一切,但在运行时它无法解析我的所有测试字符串。 And the thing that really annoys me is that I can't figure how to fix it, since I can't really debug the above code, at least in the traditional meaning of the word. 而真正让我烦恼的是我无法弄清楚如何修复它,因为我无法真正调试上面的代码,至少在这个词的传统意义上。 Basically the only way I see I can fix the above code is by knowing what I am doing wrong. 基本上我认为我可以解决上述代码的唯一方法是知道我做错了什么。

So, the question is that I don't know what is wrong with the above code. 所以,问题是我不知道上面的代码有什么问题。 How would you define the above grammar? 你会如何定义上述语法?

The ExpressionAST type I am using is: 我使用的ExpressionAST类型是:

struct MockExpressionNode {
    std::string name;
    std::vector< MockExpressionNode > operands;

    typedef std::vector< MockExpressionNode >::iterator iterator;
    typedef std::vector< MockExpressionNode >::const_iterator const_iterator;

    iterator begin() { return operands.begin(); }
    const_iterator begin() const { return operands.begin(); }
    iterator end() { return operands.end(); }
    const_iterator end() const { return operands.end(); }

    bool is_leaf() const {
        return ( operands.begin() == operands.end() );
    }
};

BOOST_FUSION_ADAPT_STRUCT(
    MockExpressionNode,
    (std::string, name)
    (std::vector<MockExpressionNode>, operands)
)

As far as debugging, its possible to use a normal break and watch approach. 至于调试,它可以使用正常的休息和观看方法。 This is made difficult by how you've formatted the rules though. 尽管如何格式化规则使这变得困难。 If you format per the spirit examples (~one parser per line, one phoenix statement per line), break points will be much more informative. 如果您根据精神示例进行格式化(每行〜一个解析器,每行一个凤凰语句),断点将提供更多信息。

Your data structure doesn't have a way to distinguish A() from SOME in that they are both leaves (let me know if I'm missing something). 你的数据结构没有办法区分A()SOME ,因为它们都是叶子(让我知道我是否遗漏了什么)。 From your variant comment, I don't think this was your intention, so to distinguish these two cases, I added a bool commandFlag member variable to MockExpressionNode (true for A() and false for SOME ), with a corresponding fusion adapter line. 从您的变体注释中,我认为这不是您的意图,因此为了区分这两种情况,我将一个bool commandFlag成员变量添加到MockExpressionNode(对于A()为true,对于SOME为false),以及相应的融合适配器行。

For the code specifically, you need to pass the start rule to the base constructor, ie: 具体来说,您需要将开始规则传递给基础构造函数,即:

InputGrammar() : InputGrammar::base_type(instruction) {...}

This is the entry point in the grammar, and is why you were not getting any data parsed. 这是语法的入口点,也是您没有获得任何数据解析的原因。 I'm surprised it compiled without it, I thought that the grammar type was required to match the type of the first rule. 我很惊讶没有它编译,我认为语法类型需要匹配第一个规则的类型。 Even so, this is a convenient convention to follow. 即便如此,这也是一个方便的惯例。

For the tag rule, there are actually two parsers qi::char_("a-zA-Z_") , which is _1 with type char and *qi::char_("a-zA-Z_0-9") which is _2 with type (basically) vector<char> . 对于tag规则,实际上有两个解析器qi::char_("a-zA-Z_") ,它是_1,类型为char*qi::char_("a-zA-Z_0-9")是_2与类型(基本上) vector<char> Its not possible to coerce these into a string without autorules, But it can be done by attaching a rule to each parsed char: 不可能将这些强制转换为没有autorule的字符串,但可以通过将规则附加到每个已解析的char来完成:

tag =   qi::char_("a-zA-Z_")
        [ at_c<0>(qi::_val) = qi::_1 ];
    >> *qi::char_("a-zA-Z_0-9")           //[] has precedence over *, so _1 is 
        [ at_c<0>(qi::_val) += qi::_1 ];  //  a char rather than a vector<char>

However, its much cleaner to let spirit do this conversion. 然而,让灵魂做这种转换更加清洁。 So define a new rule: 因此,定义一个新规则:

qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
identifier %= qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");

And don't worry about it ;). 并且不要担心;)。 Then tag becomes 然后标签变成了

tag = identifier
      [
          at_c<0>(qi::_val) = qi::_1,
          ph::at_c<2>(qi::_val) = false //commandFlag
      ]

For command, the first part is fine, but theres a couple problems with (*instruction >> ",")[ push_back( at_c<1>(qi::_val) , qi::_1 ) ] . 对于命令,第一部分很好,但是有一些问题(*instruction >> ",")[ push_back( at_c<1>(qi::_val) , qi::_1 ) ] This will parse zero or multiple instruction rules followed by a ",". 这将解析零或多个指令规则,后跟“,”。 It also attempts to push_back a vector<MockExpressionNode> (not sure why this compiled either, maybe not instantiated because of the missing start rule?). 它还尝试push_back一个vector<MockExpressionNode> (不知道为什么这个编译,也许没有实例化,因为缺少启动规则?)。 I think you want the following (with the identifier modification): 我想你想要以下(标识符修改):

command =
        identifier
        [
           ph::at_c<0>(qi::_val) = qi::_1, 
           ph::at_c<2>(qi::_val) = true    //commandFlag
        ]
    >>  "("
    >> -(instruction % ",")
        [
           ph::at_c<1>(qi::_val) = qi::_1
        ]
    >>  ")";

This uses the optional operator - and the list operator % , the latter is equivalent to instruction >> *("," >> instruction) . 这使用可选运算符-和列表运算符% ,后者等同于instruction >> *("," >> instruction) The phoenix expression then just assigns the vector directly to the structure member, but you could also attach the action directly to the instruction match and use push_back. 然后,phoenix表达式只是将向量直接赋给结构成员,但您也可以将操作直接附加到指令匹配并使用push_back。

The instruction rule is fine, I'll just mention that it is equivalent to instruction %= (command|tag) . 指令规则没问题,我只想提一下它等同于instruction %= (command|tag)

One last thing, if there actually is no distinction between A() and SOME (ie your original structure with no commandFlag ), you can write this parser using only autorules: 最后一点,如果A()SOME (即没有commandFlag原始结构A()之间实际上没有区别,你可以只使用autorules来编写这个解析器:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), ascii::space_type> {
   InputGrammar() : InputGrammar::base_type( command ) {
      identifier %=
             qi::char_("a-zA-Z_")
         >> *qi::char_("a-zA-Z_0-9");
      command %=
            identifier
         >> -(
            "("
         >> -(command % ",")
         >>  ")");
    }
    qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
    qi::rule< Iterator , ExpressionAST(void) , ascii::space_type > command;
};

This is the big benefit of using a fusion wrapped structure that models the input closely. 这是使用融合包裹结构的最大好处,该结构可以密切地模拟输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM