简体   繁体   中英

Escape characters for an antlr lexer and parser

I am new to antlr and looking to build a parser, part of which requires me to match strings, however I am looking to preserve the meaning of escape characters

\0, \b, \t, \n, \f, \r, \", \', \\

Some of these symbols are used in various within the grammar of my language positions, hence I am looking to define an ESCAPE_CHAR token by:

SINGLE_QUOTE: '\'' ;
DOUBLE_QUOTE: '"' ;
ESCAPE_ZERO : '\0' ;
ESCAPE_BACKSPACE : '\b' ;
ESCAPE_TAB : '\t' ;
ESCAPE_NEWLINE : '\n' ;
ESCAPE_FORMFEED : '\f' ;
ESCAPE_CARRIAGERETURN : '\r' ;
ESCAPE_BACKSLASH : '\\' ;
ESCAPE_CHAR: ESCAPE_ZERO | ESCAPE_BACKSPACE | ESCAPE_TAB | ESCAPE_NEWLINE | ESCAPE_FORMFEED | ESCAPE_CARRIAGERETURN | DOUBLE_QUOTE | SINGLE_QUOTE | ESCAPE_BACKSLASH ;

However, ESCAPE_ZERO is giving me the warning

non-fragment lexer rule ESCAPE_CHAR can match the empty string

And when making ESCAPE_ZERO a fragment, I see the warning

invalid escape sequence \0

I am new to antlr so I don't really know what changes I need to make, any help would be greatly appreciated

You need to escape the \ inside a literal in ANTLR as well. If you don't, the lexer rule ESCAPE_ZERO: '\0'; matches the null character instead of a backslash followed by the zero digit. And this null character has no "width" which causes ANTLR to produce the error [...] can match the empty string .

Instead of all your separate rules, try something like this:

STRING
 : '"' ( ~[\\"\r\n] | ESCAPE_CHAR )* '"'
 ;

fragment ESCAPE_CHAR
 : '\\' [0btnfr"'\\]
 ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM