I have a Hello.g4
grammar file with a grammar definition:
definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
Now, if I am trying to build a parse tree from the following input:
a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of
it returns errors:
Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}
though the:
a b c d at: abc bcd!
works correct.
What is wrong with the grammar or input or interpreter?
If I modify the wordsWithPunctuation
rule, by adding (... | 'of' | ',' word | ...)
then it matches the input completely, but it looks suspicious for me - how the word of
is different from the word a
or abc
? Or why the ,
is different from other punctuation
characters (ie, why does it match the :
or !
, but not ,
?)?
I am working with ANTLR4 plugin for Eclipse, so the project build happens with the following output:
ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8
the presented above grammar is just a partial from:
grammar Hello;
text : (entry)+ ;
entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world';
blub : WORD ;
sims : sim (',' sim)* ;
sim : words ;
definitionAndExamples : definitions (';' examples)? ;
definitions : definition (';' definition )* ;
definition : wordsWithPunctuation ;
examples : example (';' example )* ;
example : '"' wordsWithPunctuation '"' ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
It looks now for me, that the words from the entry
rule somehow breaking the other rules within the entry
rule. But why? Is it a kind an anti-pattern in the grammar?
By including 'of'
in a parser rule, ANTLR is creating an implicit anonymous token to represent that input. The word of
will always have that special token type, so it will never have the type WORD
. The only place it may appear in your parse tree is at a location where 'of'
appears in a parser rule.
You can prevent ANTLR from creating these anonymous token types by separating your grammar into a separate lexer grammar HelloLexer
in HelloLexer.g4 and parser grammar HelloParser
in HelloParser.g4 . I highly recommend you always use this form for the following reasons:
Once you have the grammar separated, you can update your word
parser rule to allow the special token of
to be treated as a word.
word
: WORD
| 'of'
| ... other keywords which are also "words"
;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.