简体   繁体   中英

ANTLR4: Whitespace and Space lexical handling

In my (simplyfied) grammar

    grammar test;


    prog: stat+;

    stat: 
              sourceDef ';'
    ;

    sourceDef: 
        SRC COLON ID 
    ;



    STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote

    LINE_COMMENT
        : '//' (~('\n'|'\r'))* -> skip;

    WS  : [ \t\n\r]+ -> skip;
    //SP : ' ' -> skip;


    COMMENT : '/*' .*? '*/' -> skip;
    LE: '<';
    MINUS: '-';
    GR: '>';  
    COLON: ':' ;
    HASH: '#';
    EQ: '=';
    SEMI: ';';
    COMMA: ','; 
    AND:  [Aa][Nn][Dd];
    SRC: [Ss][Rr][Cc];
    NUMBER: [0-9];
    ID: [a-zA-Z][a-zA-z0-9]+;
    DAY: ('0'[1-9]|[12][0-9]|'3'[01]);
    MONTH: ('0' [1-9]|'1'[012]);
    YEAR: [0-2] [890] NUMBER NUMBER;
    DATE: DAY  [- /.] MONTH [- /.] YEAR;

the code

src : xxx;

shows a syntax error:

extraneous input ' ' expecting ':'

The code

src:xxx;

resolves fine.

The modified version with

    WS  : [\t\n\r]+ -> skip;
    SP : ' ' -> skip;

works fine with both syntax versions (with and without spaces). So the spaces seem to be skipped only, if they are defined in a separate rule.

Is something wrong with this

    WS  : [ \t\n\r]+ -> skip;

definition?

Or what else could cause this (to me) unexpected behavior?

I assume that you have already found the solution, but for the sake of record. Your whitespace lexer rule should be:

WS  :   (' '|'\r'|'\n'|'\t') -> channel(HIDDEN);

In your grammar space char just is not specified, that is all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM