简体   繁体   中英

ANTLR4 ignoring tokens

I'm writing a compiler for a language as a project for university using ANTLR4. I used Java to write this compiler, and took the Visitor pattern, when I reached the test phase i noticed that ANTLR is ignoring parts of my code and generating errors that it shouldn't generate.

grammar:

grammar smallJava;

/*-----------------
Parser Rules
*/

start:program;

program
    :imports* classcode EOF;

imports
    :'import' imported ';';

imported
    :classimported=('Small_Java.lang' | 'Small_Java.io');

classcode
    :modifier? 'class_SJ' classname '{' declaration* 'main_SJ' '{' statement* '}' '}';

modifier
    :'public'
    |'protected';

classname
    :IDF;

declaration
    :type variables=vars ';';

type
    :'int_SJ'
    |'float_SJ'
    |'string_SJ';

vars
    :IDF ',' follow=vars                                    #vars_follow
    |IDF                                                    #vars_end
    ;

statement
    :assign_statement
    ;

assign_statement
    :idf=IDF ':=' right=expression ';';

expression: expressiona; // axiome of "expression"

//left recursion removed using : A -> A alpha / beta <=> A -> beta A' && A' -> alpha A' / epsilon
expressiona
    :left=expressiona operator=('+'|'-') right=expressionb  #expression_pm
    |expressionb                                            #expression_b
    ;

expressionb
    :left=expressionb operator=('*'|'/') right=expressionc  #expression_md
    |expressionc                                            #expression_c
    ;

expressionc
    :'(' expressiona ')'                                    #expression_parenthesis
    |value                                                  #expression_value
    ;

value
    :INT                                                    #integer
    |STRING                                                 #string
    |FLOAT                                                  #float
    |IDF                                                    #idf
    ;

/*-----------------
Lexer Rules
*/

fragment DIGIT0: [0-9];
fragment DIGIT1: [1-9];
fragment LETTER: ('A'..'Z')|('a'..'z');
fragment CHAR: LETTER|DIGIT0;


INT: '0'|DIGIT1 DIGIT0*;
FLOAT
    :'.' DIGIT0+
    |INT '.' DIGIT0*;

STRING: '"' (CHAR|' '|'('|')'|'\\"')*? '"'; //STRING: '"' ('\\"'|.)*? '"';

IDF:LETTER (LETTER|DIGIT0)*;

WS: [ \n\t] -> skip;

and here's my main:

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;

public class Main {
    public static void main(String[] args) {
        String test =
            "import Small_Java.lang;\n" +
            "public class_SJ Test{\n" +
                "\tint_SJ varTest;\n" +
                "\tmain_SJ{\n" +
                    "\t\tvarTest := 1+1;\n" +
                "\t}\n" +
            "}";

        ANTLRInputStream input = new ANTLRInputStream(test);
        smallJavaLexer lexer = new smallJavaLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        smallJavaParser parser = new smallJavaParser(tokens);
        ParseTree tree = parser.expression();
        myVisitor v = new myVisitor();

        v.visit(tree);
    }
}

when I run the Main it shows this:

line 1:0 mismatched input 'import' expecting {'(', INT, FLOAT, STRING, IDF}

Am I wrong somewhere in my grammar? If not, why is it doing that?

This line:

ParseTree tree = parser.expression();

tells the parser object to parse an expression (that is, the non-terminal expression defined by your grammar), so it correctly grumbles when it sees the token import .

Presumably your intent was to parse a program , in which case you would need to call the program member function:

ParseTree tree = parser.program();

Your start production is essentially pointless, since all it does is defer to program . Starting a grammar with a start rule is common because some other parser generators have the concept of a "start rule", meaning that the generated parser always attempts to parse the same non-terminal. But Antlr really doesn't have this concept; you can accept a top-level match of any non-terminal in your grammar using the member function with that name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM