[英]antlr4 literal string handling
I have the following antlr4 grammar: 我有以下antlr4语法:
grammar squirrel;
program: globalstatement+;
globalstatement: globalvardef | classdef | functiondef;
globalvardef: IDENT '=' constantexpr ';';
classdef: CLASS IDENT '{' classstatement+ '}';
functiondef: FUNCTION IDENT '(' parameterlist ')' functionbody;
constructordef: CONSTRUCTOR '(' parameterlist ')' functionbody;
parameterlist: IDENT (',' IDENT)* | ;
functionbody: '{' statement* '}';
classstatement: globalvardef | functiondef | constructordef;
statement: expression ';';
expression:
IDENT # ident |
IDENT '=' expression # assignment |
IDENT ('.' IDENT)+ # lookupchain |
constantexpr # constant |
IDENT '(' expressionlist ')' # functioncall |
expression '+' expression # addition;
constantexpr: INTEGER | STRING;
expressionlist: expression (',' expression)* | ;
CONSTRUCTOR: 'constructor';
CLASS: 'class';
FUNCTION: 'function';
COMMENT: '//'.*[\n];
STRING: '"' CHAR* '"';
CHAR: [ a-zA-Z0-9];
INTEGER: [0-9]+;
IDENT: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
Now if I parse this file: 现在,如果我解析此文件:
z = "global variable";
class Base
{
z = 10;
}
everything is fine: 一切顺利:
@0,0:0='z',<16>,1:0
@1,2:2='=',<1>,1:2
@2,4:20='"global variable"',<14>,1:4
@3,21:21=';',<2>,1:21
@4,26:30='class',<11>,3:0
@5,32:35='Base',<16>,3:6
@6,38:38='{',<3>,4:0
@7,42:42='z',<16>,5:1
@8,44:44='=',<1>,5:3
@9,46:47='10',<15>,5:5
@10,48:48=';',<2>,5:7
@11,51:51='}',<4>,6:0
@12,56:55='<EOF>',<-1>,8:0
But with this file: 但是有了这个文件:
z = "global variable";
class Base
{
z = "10";
}
I get this: 我得到这个:
@0,0:0='z',<16>,1:0
@1,2:2='=',<1>,1:2
@2,4:49='"global variable";\r\n\r\nclass Base\r\n{\r\n\tz = "10"',<14>,1:4
@3,50:50=';',<2>,5:9
@4,53:53='}',<4>,6:0
@5,58:57='<EOF>',<-1>,8:0
So it seems like everything between the first " and last " in a file gets matched to one string literal. 因此,似乎文件中第一个“和最后一个”之间的所有内容都与一个字符串文字匹配。
How do I prevent this ? 我该如何预防呢?
Note the string is matching from the first quote to the last possible quote. 请注意,字符串从第一个引号到最后一个可能的引号匹配。
By default, a Kleene operator ( *
) in ANTLR is greedy. 默认情况下,ANTLR中的Kleene运算符( *
)是贪婪的。 So, change 所以,改变
STRING: '"' CHAR* '"';
to 至
STRING: '"' CHAR*? '"';
to make it non-greedy. 使它不贪心。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.