简体   繁体   English

ANTLR,表达式语法麻烦

[英]ANTLR, Trouble with expression grammar

I've recently started using ANTLR. 我最近开始使用ANTLR。 I'm currently trying to encode an expression grammar with + , - , * and array[index] and a few more constructs. 我目前正在尝试使用+-*array[index]以及一些其他构造来编码表达式语法。

This is the desired grammar: 这是所需的语法:

Exp -> Exp (+ | - | * | < | &&) Exp
     | Exp [ Exp ]
     | -Exp
     | ( Exp )
     | Exp.length
     | true
     | false
     | Id
     | this
     | ! Exp

I first refactored this into AndExp , SumExp , ProdExp and so on to resolve precedence. 我首先将其重构为AndExpSumExpProdExp等以解决优先级问题。 Roughly like this: 大概是这样的:

Exp        -> AndExp
AndExp     -> CmpExp (&& CmpExp)*
CmpExp     -> SumExp (< SumExp)*
SumExp     -> ProdExp ((+|-) ProdExp)*
ProdExp    -> UnaryExp (Times UnaryExp)*
UnaryExp   -> Minus* PrimaryExp
PrimaryExp -> Exp.length
            | Exp [ Exp ]
            | ! Exp
            | true
            | false
            | Id
            | this

I then realized that this uses left-recursion, and that ANTLR doesn't like that. 然后我意识到这使用左递归,而ANTLR不喜欢这样。 I went on to eliminate the left-recursion and I ended up with this grammar: 我继续消除左递归 ,我最终得到了这个语法:

grammar test;

options {
    language=Java;
    output=AST;
    backtrack=true;
}

start      : expression;

expression : andExp;
andExp     : cmpExp (And^ cmpExp)*;
cmpExp     : sumExp (LessThan^ sumExp)*;
sumExp     : prodExp ((Plus | Minus)^ prodExp)*;
prodExp    : unaryExp (Times^ unaryExp)*;
unaryExp   : Minus* primaryExp;

primaryExp : INT                   primaryPrime
           | 'true'                primaryPrime
           | 'false'               primaryPrime
           | 'this'                primaryPrime
           | ID                    primaryPrime
           | '!' expression        primaryPrime
           | '('! expression ')'!  primaryPrime
           ;


primaryPrime
           : '[' expression ']'             primaryPrime
           | '.' ID '(' exprList ')'        primaryPrime
           | '.length'                      primaryPrime
           | 'new' 'int' '[' expression ']' primaryPrime
           | 'new' ID '(' ')'               primaryPrime
           |
           ;


exprList   :
           | expression (',' expression)*;

LessThan   : '<';
Plus       : '+';
Minus      : '-';
Times      : '*';
And        : '&&';
Not        : '!';
INT        :    '0' | ('1'..'9')('0'..'9')*;
ID         :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
WS         : ('\t' | ' ' | '\r' | '\n'| '\u000C') { $channel=HIDDEN; } ;
  • Q1: Is backtracking "required" for this type of grammar (I can't get it through ANTLR unless I activate it) or am I missing something simple? Q1:对于这种类型的语法是否“需要”回溯(除非我激活它,否则我无法通过ANTLR得到它)或者我错过了一些简单的东西?

  • Q2: When adding a few ^ and -> ^(TOKEN ...) constructs to brush up the tree, I ran into the following annoying situation (because of the primaryPrime which is due to the left factoring): Q2:当添加了几个^-> ^(TOKEN ...)构建刷上树,我碰到了以下恼人的情况(因为的primaryPrime这是由于左保理):

     primaryPrime : '[' expression ']' primaryPrime -> ^(ARRAY_EXPR expression) //... 

    This turns an array[index] into 这会将array[index]转换为

     array ARRAY_EXPR index 

    while I really want 虽然我真的想要

     ARRAY_EXPR array index 

    What is the best way to solve this? 解决这个问题的最佳方法是什么? Am I on the right track here, or should I go with some other approach all together? 我在这里是正确的轨道,还是我应该采取其他一些方法?

A1 A1

No, backtracking is not (yet) required. 不,还不需要回溯。 But if you do need some backtracking, it's advisable to not use backtrack=true right away but use predicate before the rules that do need backtracking. 但是如果你确实需要一些回溯,建议不要立即使用backtrack=true ,而是在需要回溯的规则之前使用谓词。 By using backtrack=true , you're enabling backtracking on all of your rules, while it's probably only one or two needing backtracking. 通过使用backtrack=true ,您可以启用所有规则的回溯,而可能只有一两个需要回溯。 But, if your language will be relatively small, backtrack=true is easier than mixing in predicates by hand, and probably won't have a big impact on performance. 但是,如果您的语言相对较小,则backtrack=true比手动混合谓词更容易,并且可能不会对性能产生很大影响。 But if you can avoid them, do so. 但如果你能避免它们,那就去做吧。

You have a couple of parser rules that match empty strings, which are causing the problems. 您有几个匹配空字符串的解析器规则,这些规则会导致问题。 You'd usually better let rules match something, and make the rule optional. 您通常最好让规则匹配某些内容,并使规则可选。 So instead of: 所以代替:

foo : bar baz ;
bar : 'bar' ;
baz : 'baz' | /* epsilon */ ;

do

foo : bar baz? ;
bar : 'bar' ;
baz : 'baz' ;

instead. 代替。

Also, in case of reserved keywords like true , false etc., don't mix them in your parser rules: always explicitly define them at the top of your lexer rules. 此外,如果保留关键字如truefalse等,请不要在解析器规则中混合它们:始终在词法分析器规则的顶部明确定义它们。 Lexer rules are matched starting from top to bottom, so it safest to define them (at least) before rules like ID that could possible match them as well. Lexer规则从上到下匹配,因此最安全地(至少)在ID规则之前定义它们也可以匹配它们。 I usually put them as first lexer rules. 我通常将它们作为第一个词法规则。

A2 A2

You could do that by passing parameters to your parser rules, although that makes your grammar (a bit) less readable. 可以通过将参数传递给解析器规则做到这一点,尽管这会使你的语法(有点)可读性降低。

Your grammar with my comments: 你的语法和我的评论:

grammar test;

options {
  output=AST;
}

tokens {
  ARRAY_EXPR;
}

start      : expression;

expression : andExp;
andExp     : cmpExp (And^ cmpExp)*;
cmpExp     : sumExp (LessThan^ sumExp)*;
sumExp     : prodExp ((Plus | Minus)^ prodExp)*;
prodExp    : unaryExp (Times^ unaryExp)*;
unaryExp   :  '-' primaryExp
           |  '!' primaryExp // negation is really a `unaryExp`
           |  primaryExp
           ;

primaryExp : INT                  primaryPrime[null]?
           | 'true'               primaryPrime[null]?
           | 'false'              primaryPrime[null]?
           | 'this'               primaryPrime[null]?
           | (ID -> ID)           (primaryPrime[new CommonTree($ID)] -> primaryPrime)?
           | '('! expression ')'! primaryPrime[null]?
           ;

// removed the matching of 'epsilon'
primaryPrime [CommonTree parent]
           : '[' expression ']'             primaryPrime[null]? -> ^(ARRAY_EXPR {parent} expression primaryPrime?)
           | '.' ID '(' exprList? ')'       primaryPrime[null]?
           | '.length'                      primaryPrime[null]?
           | 'new' 'int' '[' expression ']' primaryPrime[null]?
           | 'new' ID '(' ')'               primaryPrime[null]?
           ;

// removed the matching of 'epsilon' 
exprList   : expression (',' expression)*;

// be sure to create explicit tokens for keywords!
True       : 'true';
False      : 'false';
This       : 'this';
LessThan   : '<';
Plus       : '+';
Minus      : '-';
Times      : '*';
And        : '&&';
Not        : '!';
INT        : '0' | ('1'..'9')('0'..'9')*;
ID         : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
WS         : ('\t' | ' ' | '\r' | '\n'| '\u000C') { $channel=HIDDEN; } ;

will parse the input "array[2*3]" into the following AST: 将输入"array[2*3]"解析为以下AST:

在此输入图像描述

as you can see by running the following test class: 正如您通过运行以下测试类所看到的:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "array[2*3]";
    testLexer lexer = new testLexer(new ANTLRStringStream(source));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    testParser parser = new testParser(tokens);
    testParser.start_return returnValue = parser.start();
    CommonTree tree = (CommonTree)returnValue.getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM