简体   繁体   English

ANTLR 4:语法分析

[英]ANTLR 4: Parsing grammar

I want to parse some data from AppleSoft Basic script. 我想从AppleSoft Basic脚本中解析一些数据。 I choose ANTLR and download this grammar: jvmBasic 我选择ANTLR并下载此语法: jvmBasic

I'm trying to extract function name without parameters: 我试图提取不带参数的函数名称:

return parser.prog().line(0).amprstmt(0).statement().getText();

but it returns PRINT"HELLO" eg full expression except the line number Here is string i want to parse: 但它返回PRINT“ HELLO”,例如除行号以外的完整表达式这是我要解析的字符串:

10 PRINT "Hello!" 10打印“您好!”

I think this question really depends on your ANTLR program implementation but if you are using a treewalker/listener you probably want to be targeting the rule for the specific tokens not the entire "statement" rule which is circular and encompasses many types of statement : 我认为这个问题确实取决于您的ANTLR程序实现,但是如果您使用的是treewalker / listener,您可能希望针对特定令牌的规则而不是整个“声明”规则的目标,该规则是循环的并且包含多种类型的语句:

//each line can have one to many amprstmt's
line
   : (linenumber ((amprstmt (COLON amprstmt?)*) | (COMMENT | REM)))
   ;

amprstmt
   : (amperoper? statement) //encounters a statement here
   | (COMMENT | REM)
   ;
//statements can be made of 1 to many sub statements
statement
   : (CLS | LOAD | SAVE | TRACE | NOTRACE | FLASH | INVERSE | GR | NORMAL | SHLOAD | CLEAR | RUN | STOP | TEXT | HOME | HGR | HGR2)
   | prstmt
   | printstmt1 //the print rule
   //MANY MANY OTHER RULES HERE TOO LONG TO PASTE........
   ;
//the example rule that occurs when the token's "print" is encountered
printstmt1
   : (PRINT | QUESTION) printlist?
   ;

printlist
   : expression (COMMA | SEMICOLON)? printlist*
   ;

As you can see from the BNF type grammar here the statement rule in this grammar includes the rules for a print statement as well as every other type of statement so it will encompass 10, PRINT and hello and subsequently return the text with the getText() method when any of these are encountered in your case, everything but linenumber which is a rule outside of the statement rule. 从BNF类型语法中可以看到,该语法中的语句规则包括print语句以及其他所有类型的语句的规则,因此它将包含10,PRINT和hello,并随后使用getText()返回文本方法,当遇到您遇到的任何情况时,除了行号以外的所有内容(这是语句规则之外的规则)。

If you want to target these specific rules to handle what happens when they are encountered you most likely want to add functionality to each of the methods ANTLR generates for each rule by extending the jvmBasiListener class as shown here 如果你想针对这些具体的规则来处理发生的事情都遇到他们时,你很可能希望将功能添加到每个通过扩展jvmBasiListener类ANTLR生成每条规则的方法如图所示这里

example: 例:

-jvmBasicListener.java
-extended to jvmBasicCustomListener.java

void enterPrintstmt1(jvmBasicParser.Printstmt1Context ctx){
System.out.println(ctx.getText());
}

However if all this is setup and you are just wanting to return a string value etc using the single line you have then trying to access the methods at a lower level by addressing the child nodes of statement may work amprstmt->statement->printstmt1->value : 但是,如果所有这些都已设置,并且您只想使用单行返回字符串值等,则尝试通过处理语句的子节点来访问较低级别的方法可能会起作用amprstmt-> statement-> printstmt1- >值:

 return  parser.prog().line().amprstmt(0).statement().printstmt1().getText();

Just to maybe narrow my answer slightly, the rules specifically that address your input "10 PRINT "HELLO" " would be : 为了稍微缩小我的答案范围,专门针对您的输入“ 10 PRINT“ HELLO”“的规则将是:

linenumber (contains Number) , statement->printstmt1 and statement->datastmt->datum (contains STRINGLITERAL)

So as shown above the linenumber rule exists on its own and the other 2 rules that defined your text are children of statement, which explains outputting everything except the line number when getting the statement rules text. 因此,如上所示,linenumber规则是独立存在的,定义文本的其他2条规则是statement的子代,这解释了在获取statement rule文本时输出除行号以外的所有内容。

Addressing each of these and using getText() rather than an encompassing rule such as statement may give you the result you are looking for. 处理这些问题并使用getText()而不是诸如声明之类的规则可能会为您提供所需的结果。


I will update to address your question since the answer may be slightly longer, the easiest way in my opinion to handle specific rules rather than generating a listener or visitor would be to implement actions within your grammar file rules like this : 我将更新以解决您的问题,因为答案可能会稍长一些,我认为处理特定规则而不是生成侦听器或访问者的最简单方法是在语法文件规则中实现以下操作:

printstmt1
   : (PRINT | QUESTION) printlist? {System.out.println("Print"); //your java code }
   ;

This would simply allow you to address each rule and perform whichever java action you would wish to carry out. 这将简单地允许您处理每个规则并执行您希望执行的Java操作。 You can then simply compile your code with something like : 然后,您可以使用以下代码简单地编译代码:

java -jar antlr-4.5.3-complete.jar jvmBasic.g4 -visitor

After this you can simply run your code however you wish, here is an example: 之后,您可以按照自己的意愿运行代码,这是一个示例:

import JVM1.jvmBasicLexer;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;


public class Jvm extends jvmBasicBaseVisitor<Object> {


    public static void main(String[] args) {
        jvmBasicLexer lexer = new jvmBasicLexer(new ANTLRInputStream("10 PRINT \"Hello!\""));
        jvmBasicParser parser = new jvmBasicParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.prog();
    }

}

The output for this example would then be just : 该示例的输出将仅为:

Print

You could also incorporate whatever Java methods you like within the grammar to address each rule encountered and either develop your own classes and methods to handle it or directly print it out a result. 您还可以在语法中合并您喜欢的任何Java方法,以解决遇到的每个规则,或者开发自己的类和方法来处理它,或者直接将其打印出来。


Update 更新资料

Just to address the latest question now : parser.line().linenumber().getText() - For line Number, as line is not part of a statement 现在只想解决最新的问题: parser.line().linenumber().getText() -对于行号,因为line不是语句的一部分

parser.prog().line(0).amprstmt(0).statement().printstmt1().PR‌​INT().getText() - For PRINT as it is isolated in printstmt1, however does not include CLR in the rule parser.prog().line(0).amprstmt(0).statement().printstmt1().PR‌​INT().getText() PR‌INT parser.prog().line(0).amprstmt(0).statement().printstmt1().PR‌​INT().getText() -对于PRINT,因为它在printstmt1中是隔离的,但是在规则

parser.prog().line(0).amprstmt(0).statement().printstmt1().pr‌intlist().expression().getText() - To get the value "hello" as it is part of an expression contained within the printstmt1 rule. parser.prog().line(0).amprstmt(0).statement().printstmt1().pr‌intlist().expression().getText() -获取值“ hello”,因为它是表达式的一部分包含在printstmt1规则中。

:) Good luck :) 祝好运

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM