简体   繁体   中英

Properly catching an unclosed string in ANTLR4

I have to define string literal in ANTLR4 and catch UNCLOSE_STRING exceptions.

Strings are surrounded by a pair of "" and have may have supported escapes:

\\b \\f \\r \\n \\t \\' \\\\

The only way for " to appear inside a string is to be appended by a ' ('").

I have tried various ways to define a string literal but they were all catched by UNCLOSE_STRING:

program: global_variable_part function_declaration_part EOF;
<!-- Shenanigans of statements ...-->
fragment Character: ~( [\b\f\r\n\t"\\] | '\'') | Escape | '\'"';
fragment Escape: '\\' ( 'b' | 'f' | 'r' | 'n' | 't' | '\'' | '\\');
fragment IllegalEscape: '\\' ~( 'b' | 'f' | 'r' | 'n' | 't' | '\'' | '\\') ;

STR_LIT: '"' Character* '"' {
    content = str(self.text)
    self.text = content[1:-1]
};

UNCLOSE_STRING: '"' Character* ([\b\f\r\n\t\\] | EOF) {
    esc = ['\b', '\t', '\n', '\f', '\r', '\\']
    content = str(self.text)
    raise UncloseString(content)
};

For example "ab'"c\\\\n def" would match but only Unclosed String: ab'"c\\n def" was produced.

This is quite close to the specification for Strings in Java. Don't be afraid to "borrow" from other grammars. I slight modification to the Java Lexer rules that (I think) matches your needs would be:

StringLiteral
    :   '"' StringCharacters? '"'
    ;
fragment
StringCharacters
    :   StringCharacter+
    ;
fragment
StringCharacter
    :   ~["\\\r\n]
    |   EscapeSequence
    ;

fragment
EscapeSequence
    :   '\\' [btnfr'\\]
    :   "\'""  // <-- the '" escape match
    ;

If you know of another language that's a closer match, you can look at how it was handled for looking for it's grammar here ( ANTLR4 Grammars )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM