We're using ANTLR to parse C, and a lot of our code has the dot notation for structs. It's been awhile since I've written C, but from what I remember, these two statements are synonymous:
void hello() {
this->hello = "hello";
this.hello = "hello";
}
ANTLR is able to parse the greeting->hello
without any issues, however, the dot notation throws the following error:
line 3:4 mismatched input 'this.hello' expecting '}'
If we switch the statements like this:
void hello() {
this.hello = "hello";
this->hello = "hello";
}
The errors are:
line 2:4 mismatched input 'this.hello' expecting {'__extension__', '__builtin_va_arg', '__builtin_offsetof', '__m128', '__m128d', '__m128i', '__typeof__', '__inline__', '__stdcall', '__declspec', '__asm', '__attribute__', '__asm__', 'auto', 'break', 'case', 'char', 'const', 'continue', 'default', 'do', 'double', 'enum', 'extern', 'float', 'for', 'goto', 'if', 'inline', 'int', 'long', 'register', 'restrict', 'return', 'short', 'signed', 'sizeof', 'static', 'struct', 'switch', 'typedef', 'union', 'unsigned', 'void', 'volatile', 'while', '_Alignas', '_Alignof', '_Atomic', '_Bool', '_Complex', '_Generic', '_Noreturn', '_Static_assert', '_Thread_local', '(', '{', '}', '+', '++', '-', '--', '*', '&', '&&', '!', '~', ';', Identifier, Constant, DigitSequence, StringLiteral}
line 3:8 no viable alternative at input 'this->'
line 4:0 extraneous input '}' expecting <EOF>
We're using the C grammar from the ANTLR Grammars repository . That being said, we adjusted it to handle #include
statements and it can be seen here . What we've added are these two parsers and these two lexers:
includeExpression
: IncludeDirective includedLibExpression '"'
| IncludeDirective includedLibExpression '>'
;
includedLibExpression
: IncludedHeaderDirective
;
IncludeDirective
: '#' Whitespace? 'include' Whitespace '"'
| '#' Whitespace? 'include' Whitespace '<'
;
IncludedHeaderDirective
: ('a'..'z' | 'A'..'Z' | '.' | '_' | '/')+
;
Then to use the new parsers, we added the below to translationUnit
. To make things more confusing, if the line with includeExpression
in translationUnit
is commented out, we still get the errors.
translationUnit
: externalDeclaration
| translationUnit externalDeclaration
| includeExpression+?
;
The specific parser that should be picking this up is this:
postfixExpression
: primaryExpression
| postfixExpression '[' expression ']'
| postfixExpression '(' argumentExpressionList? ')'
| postfixExpression '.' Identifier
| postfixExpression '->' Identifier
| postfixExpression '++'
| postfixExpression '--'
| '(' typeName ')' '{' initializerList '}'
| '(' typeName ')' '{' initializerList ',' '}'
| '__extension__' '(' typeName ')' '{' initializerList '}'
| '__extension__' '(' typeName ')' '{' initializerList ',' '}'
;
What really puzzles me, is the fact that the dot notation and the arrow notation are one after the other, yet only the arrow notation is recognized.
You've added the following lexer rule to the grammar:
IncludedHeaderDirective
: ('a'..'z' | 'A'..'Z' | '.' | '_' | '/')+
;
This pattern matches the string this.hello
. So when the lexer reaches line 2 of your input, it could either apply the Identifier
rule to match this
or the IncludeHeaderDirective
rule to match this.hello
. Since the latter is the longer match, it is chosen as per the maximal munch rule.
Since an IncludedHeaderDirective
is not a valid expression, you get the error you do. In order to match the postfixExpression '.' Identifier
postfixExpression '.' Identifier
rule, this.hello
would have had to be tokenized as Identifier, '.', Identifier
, but the existence of the IncludedHeaderDirective
rule prevents that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.