简体   繁体   English

Antlworks语法解析器

[英]Antlworks grammar parser

I created a simple grammar in AntlWorks. 我在AntlWorks中创建了一个简单的语法。 Then I generated code and I have two files: grammarLexer.java and grammarParser.java . 然后,我生成了代码,并且有两个文件: grammarLexer.javagrammarParser.java My goal is to create mapping my grammar to java language. 我的目标是创建将语法映射到Java语言的映射。 What should I do next to achieve it? 接下来我该怎么做?

Here is my grammar: ` grammar grammar; 这是我的语法:`语法语法; prog : ((FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | VARIABLE) | FUNCTION_DEC)+; 编:((FOR | WHILE | IF | PRINT | DECLARATION | ENTER |(WS * FUNCTION)| VARIABLE)| FUNCTION_DEC)+;

FOR        :     WS* 'for' WS+ VARIABLE WS+ DIGIT+ WS+ DIGIT+ WS* ENTER  ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER;
WHILE        :     WS* 'while' WS+ (VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER  (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))* WS* 'end' WS* ENTER;
IF        :         WS* 'if' WS+ ( FUNCTION | VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC)* ( WS* 'else' ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))*)? WS* 'end' WS* ENTER;

CHAR        :     ('a'..'z'|'A'..'Z')+;
EQ_OPERATOR    :    ('<' | '>' | '==' | '>=' | '<=' | '!=');
DIGIT        :     '0'..'9'+;
ENTER        :     '\n';
WS        :     ' ' | '\t';

PRINT_TEMPLATE  :     WS+ (('"' (CHAR | DIGIT | WS)* '"') | VARIABLE | DIGIT+ | FUNCTION | INC_DEC);
PRINT             :     WS* 'print' PRINT_TEMPLATE (',' PRINT_TEMPLATE)*  WS* ENTER;

VARIABLE        :    CHAR(CHAR|DIGIT)*;
FUN_TEMPLATE    :    WS* (VARIABLE | DIGIT+ | '"' (CHAR | DIGIT | WS)* '"');
FUNCTION        :    VARIABLE '(' (FUN_TEMPLATE (WS* ',' FUN_TEMPLATE)*)? ')' WS* ENTER*;

DECLARATION     :    WS* VARIABLE WS* ('=' WS* (DIGIT+ | '"' (CHAR | DIGIT | WS)* '"' | VARIABLE)) WS* ENTER;
FUNCTION_DEC    :    WS*'def' WS* FUNCTION ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER*;

INC_DEC            :    VARIABLE ('--' | '++') WS* ENTER*;`

Here is my Main class for parser: ` 这是我的解析器主类:
import org.antlr.runtime.ANTLRStringStream; 导入org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CommonToken; 导入org.antlr.runtime.CommonToken; import org.antlr.runtime.CommonTokenStream; 导入org.antlr.runtime.CommonTokenStream; import org.antlr.runtime.Parser; 导入org.antlr.runtime.Parser;

public class Main {
    public static void main(String[] args) throws Exception {  
        // the input source  
        String source =   
            "for i 1 3\n " +
            "printHi()\n " +
            "end\n " +
            "if fun(y, z) == 0\n " +
            "end\n ";
// create an instance of the lexer  
         grammarLexer lexer = new grammarLexer(new ANTLRStringStream(source));  

         // wrap a token-stream around the lexer  
         CommonTokenStream tokens = new CommonTokenStream(lexer);  

         // traverse the tokens and print them to see if the correct tokens are created  
         int n = 1;  
         for(Object o : tokens.getTokens()) {  
           CommonToken token = (CommonToken)o;  
           System.out.println("token(" + n + ") = " + token.getText().replace("\n", "\\n"));  
           n++;  
         }
         grammarParser parser = new grammarParser(tokens);
         parser.file();
}
}
`

As I already mentioned in comments: your overuse of lexer rules is wrong. 正如我在评论中已经提到的:您过度使用词法分析器规则是错误的。 Look at lexer rules as being the fundamental building blocks of your language. 将词法分析器规则视为语言的基本组成部分。 Much like how you'd describe water in chemistry. 就像您在化学中描述水一样。 You would not describe water like this: 不会像这样描述水:

WATER
 : 'HHO'
 ;

Ie: as a single element. 即:作为一个要素。 Water should be described as 3 separate elements: 水应描述为3个独立元素:

water
 : Hydrogen Hydrogen Oxygen
 ;

Hydrogen : 'H';
Oxygen   : 'O';

where Hydrogen and Oxygen are the fundamental building blocks (lexer rules) and water is the compound (the parser rule). 其中HydrogenOxygen是基本构建块(词法分析规则)和water为化合物(解析器规则)。

A good rule of thumb is that if you're creating lexer rules that consist of several other lexer rules, chances are there's something fishy in your grammar. 一个好的经验法则是,如果要创建由其他几个词法分析器规则组成的词法分析器规则,则语法中可能会有些混乱。 This is not always the case, of course. 当然,并非总是如此。

Let's say you want to parse the following input: 假设您要解析以下输入:

for i 1 3
  print(i)
end

if fun(y, z) == 0
  print('foo')
end

A grammar could look like this: 语法可能如下所示:

grammar T;

options {
  output=AST;
}

tokens {
  BLOCK;
  CALL;
  PARAMS;
}

// parser rules
parse
 : block EOF!
 ;

block
 : stat* -> ^(BLOCK stat*)
 ;

stat
 : for_stat
 | if_stat
 | func_call
 ;

for_stat
 : FOR^ ID expr expr block END!
 ;

if_stat
 : IF^ expr block END!
 ;

expr
 : eq_expr
 ;

eq_expr
 : atom (('==' | '!=')^ atom)*
 ;

atom
 : func_call
 | INT
 | ID
 | STR
 ;

func_call
 : ID '(' params ')' -> ^(CALL ID params)
 ;

params
 : (expr (',' expr)*)? -> ^(PARAMS expr*)
 ;

// lexer rules
FOR : 'for';
END : 'end';
IF  : 'if';
ID  : ('a'..'z' | 'A'..'Z')+;
INT : '0'..'9'+;
STR : '\'' ~('\'')* '\'';
SP  : (' ' | '\t' | '\r' | '\n')+ {skip();};

And if you now run this test class: 如果现在运行此测试类:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = 
        "for i 1 3          \n" + 
        "  print(i)         \n" + 
        "end                \n" + 
        "                   \n" + 
        "if fun(y, z) == 0  \n" + 
        "  print('foo')     \n" + 
        "end                \n";
    TLexer lexer = new TLexer(new ANTLRStringStream(src));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

you'll see some output being printed to the console which corresponds to the following AST: 您将看到一些输出输出到控制台,该输出对应于以下AST:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM