简体   繁体   English

了解Antlr4中的上下文数据结构

[英]Understanding the context data structure in Antlr4

I'm trying to write a code translator in Java with the help of Antlr4 and had great success with the grammar part so far. 我试图在Antlr4的帮助下用Java编写代码转换器,并且到目前为止在语法部分取得了很大的成功。 However I'm now banging my head against a wall wrapping my mind around the parse tree data structure that I need to work on after my input has been parsed. 然而,现在我正在把我的脑袋撞到围绕解析树数据结构的墙上,我需要在解析输入后进行处理。

I'm trying to use the visitor template to go over my parse tree. 我正在尝试使用访问者模板来查看我的解析树。 I'll show you an example to illustrate the points of my confusion. 我将向您展示一个例子来说明我的困惑点。

My grammar: 我的语法:

grammar pqlc;

// Lexer

//Schlüsselwörter
EXISTS: 'exists';
REDUCE: 'reduce';
QUERY: 'query';
INT: 'int';
DOUBLE: 'double';
CONST: 'const';
STDVECTOR: 'std::vector';
STDMAP: 'std::map';
STDSET: 'std::set';
C_EXPR: 'c_expr';

INTEGER_LITERAL  : (DIGIT)+ ;
fragment DIGIT: '0'..'9';
DOUBLE_LITERAL : DIGIT '.' DIGIT+;

LPAREN          : '(';
RPAREN          : ')';
LBRACK          : '[';
RBRACK          : ']';
DOT             : '.';
EQUAL           : '==';
LE              : '<=';
GE              : '>=';
GT              : '>';
LT              : '<';
ADD             : '+';
MUL             : '*';
AND             : '&&';
COLON           : ':';

IDENTIFIER    :   JavaLetter JavaLetterOrDigit*;
fragment JavaLetter    :   [a-zA-Z$_]; // these are the "java letters" below 0xFF
fragment JavaLetterOrDigit    :   [a-zA-Z0-9$_]; // these are the "java letters or digits" below 0xFF
WS  
    :  [ \t\r\n\u000C]+ -> skip  
    ;
COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;


// Parser

//start_rule: query;

query :
      quant_expr
      | qexpr+
      | IDENTIFIER // order IDENTIFIER and qexpr+?
      | numeral
      | c_expr //TODO

      ;

c_type : INT | DOUBLE | CONST;
bin_op: AND | ADD | MUL | EQUAL | LT | GT | LE| GE;


qexpr:
         LPAREN query RPAREN bin_op_query? 
         // query bin_op query
         | IDENTIFIER  bin_op_query? // copied from query to resolve left recursion problem
         | numeral bin_op_query?  // ^
         | quant_expr bin_op_query? // ^
           |c_expr bin_op_query?
           // query.find(query)
         | IDENTIFIER  find_query? // copied from query to resolve left recursion problem
         | numeral find_query?  // ^
         | quant_expr find_query?
           |c_expr find_query?
           // query[query]
          | IDENTIFIER  array_query? // copied from query to resolve left recursion problem
         | numeral array_query?  // ^
         | quant_expr array_query?
           |c_expr array_query?

     // | qexpr bin_op_query // bad, resolved by quexpr+ in query 
     ;

bin_op_query: bin_op query bin_op_query?; // resolve left recursion of query bin_op query

find_query: '.''find' LPAREN query RPAREN;
array_query: LBRACK query RBRACK;

quant_expr:
    quant id ':' query
          | QUERY LPAREN match RPAREN ':' query
          | REDUCE LPAREN IDENTIFIER RPAREN id ':' query
    ;

match:
         STDVECTOR LBRACK id RBRACK EQUAL cm
     | STDMAP '.''find' LPAREN cm RPAREN EQUAL cm
     | STDSET '.''find' LPAREN cm RPAREN
     ;

cm:
    IDENTIFIER
  | numeral
   | c_expr //TODO
  ;

quant :
          EXISTS;

id :
     c_type IDENTIFIER
     | IDENTIFIER // Nach Seite 2 aber nicht der Übersicht. Laut übersicht id -> aber dann wäre Regel 1 ohne +
   ;

numeral :
            INTEGER_LITERAL
        | DOUBLE_LITERAL
        ;
c_expr:
          C_EXPR
      ;

Now let's parse the following string: 现在让我们解析以下字符串:

double x: x >= c_expr

Visually I'll get this tree: 在视觉上我会得到这棵树: 树

Let's say my visitor is in the visitQexpr(@NotNull pqlcParser.QexprContext ctx) routine when it hits the branch Qexpr(x bin_op_query). 假设我的访问者在访问分支Qexpr(x bin_op_query)时访问了visitQexpr(@NotNull pqlcParser.QexprContext ctx)例程。

My question is, how can I tell that the left children ("x" in the tree) is a terminal node, or more specifically an "IDENTIFIER"? 我的问题是,我怎么能告诉左边的孩子(树中的“x”)是终端节点,或者更具体地说是“IDENTIFIER”? There are no visiting rules for Terminal nodes since they aren't rules. 终端节点没有访问规则,因为它们不是规则。 ctx.getChild(0) has no RuleIndex. ctx.getChild(0)没有RuleIndex。 I guess I could use that to check if I'm in a terminal or not, but that still wouldn't tell me if I was in IDENTIFIER or another kind of terminal token. 我想我可以用它来检查我是否在终端,但是如果我在IDENTIFIER或其他类型的终端令牌,那仍然不会告诉我。 I need to be able to tell the difference somehow. 我需要能够以某种方式区分它们。

I had more questions but in the time it took me to write the explanation I forgot them :< Thanks in advance. 我有更多的问题,但是在我写这个解释的时候我忘了他们:<提前谢谢。

You can add labels to tokens and access them/check if they exist in the surrounding context: 您可以向标记添加标签并访问它们/检查它们是否存在于周围环境中:

id :
     c_type labelA = IDENTIFIER
     | labelB = IDENTIFIER 
   ;

You could also do this to create different visits: 您也可以这样做来创建不同的访问:

id :
     c_type IDENTIFIER    #idType1 //choose more appropriate names!
     | IDENTIFIER         #idType2
   ;

This will create different visitors for the two alternatives and I suppose (ie have not verified) that the visitor for id will not be called. 这将为两个备选方案创建不同的访问者,我想(即尚未验证)将不会调用id的访问者。

I prefer the following approach though: 我更喜欢以下方法:

id :
        typeDef
     |  otherId
     ;
typeDef: c_type IDENTIFIER;
otherId : IDENTIFIER ;

This is a more heavily typed system. 这是一个更加严格的系统。 But you can very specifically visit nodes. 但是你可以非常专门地访问节点。 Some rules of thumb I use: 我使用的一些经验法则:

  1. Use | 使用| only when all alternatives are parser rules. 只有当所有备选方案都是解析器规则时
  2. Wrap each Token in a parser rule (like otherId ) to give them "more meaning". 将每个令牌包装在解析器规则中(如otherId )以赋予它们“更多含义”。
  3. It's ok to mix parser rules and tokens, if the tokens are not really important (like ; ) and therefore not needed in the parse tree. 如果标记不是很重要(例如; ),那么可以混合使用解析器规则和标记,因此在解析树中不需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM