简体   繁体   中英

ANTLR C grammar not recognizing dot notation

We're using ANTLR to parse C, and a lot of our code has the dot notation for structs. It's been awhile since I've written C, but from what I remember, these two statements are synonymous:

void hello() {
    this->hello = "hello";
    this.hello = "hello";
}

ANTLR is able to parse the greeting->hello without any issues, however, the dot notation throws the following error:

line 3:4 mismatched input 'this.hello' expecting '}'

If we switch the statements like this:

void hello() {
    this.hello = "hello";
    this->hello = "hello";
}

The errors are:

line 2:4 mismatched input 'this.hello' expecting {'__extension__', '__builtin_va_arg', '__builtin_offsetof', '__m128', '__m128d', '__m128i', '__typeof__', '__inline__', '__stdcall', '__declspec', '__asm', '__attribute__', '__asm__', 'auto', 'break', 'case', 'char', 'const', 'continue', 'default', 'do', 'double', 'enum', 'extern', 'float', 'for', 'goto', 'if', 'inline', 'int', 'long', 'register', 'restrict', 'return', 'short', 'signed', 'sizeof', 'static', 'struct', 'switch', 'typedef', 'union', 'unsigned', 'void', 'volatile', 'while', '_Alignas', '_Alignof', '_Atomic', '_Bool', '_Complex', '_Generic', '_Noreturn', '_Static_assert', '_Thread_local', '(', '{', '}', '+', '++', '-', '--', '*', '&', '&&', '!', '~', ';', Identifier, Constant, DigitSequence, StringLiteral}
line 3:8 no viable alternative at input 'this->'
line 4:0 extraneous input '}' expecting <EOF>

We're using the C grammar from the ANTLR Grammars repository . That being said, we adjusted it to handle #include statements and it can be seen here . What we've added are these two parsers and these two lexers:

includeExpression
    : IncludeDirective includedLibExpression '"'
    | IncludeDirective includedLibExpression '>'
    ;

includedLibExpression
    : IncludedHeaderDirective
    ;

IncludeDirective
    : '#' Whitespace? 'include' Whitespace '"'
    | '#' Whitespace? 'include' Whitespace '<'
    ;

IncludedHeaderDirective
    : ('a'..'z' | 'A'..'Z' | '.' | '_' | '/')+
    ;

Then to use the new parsers, we added the below to translationUnit . To make things more confusing, if the line with includeExpression in translationUnit is commented out, we still get the errors.

translationUnit
    :   externalDeclaration
    |   translationUnit externalDeclaration
    |   includeExpression+?
    ;

The specific parser that should be picking this up is this:

postfixExpression
    :   primaryExpression
    |   postfixExpression '[' expression ']'
    |   postfixExpression '(' argumentExpressionList? ')'
    |   postfixExpression '.' Identifier
    |   postfixExpression '->' Identifier
    |   postfixExpression '++'
    |   postfixExpression '--'
    |   '(' typeName ')' '{' initializerList '}'
    |   '(' typeName ')' '{' initializerList ',' '}'
    |   '__extension__' '(' typeName ')' '{' initializerList '}'
    |   '__extension__' '(' typeName ')' '{' initializerList ',' '}'
;

What really puzzles me, is the fact that the dot notation and the arrow notation are one after the other, yet only the arrow notation is recognized.

You've added the following lexer rule to the grammar:

IncludedHeaderDirective
    : ('a'..'z' | 'A'..'Z' | '.' | '_' | '/')+
    ;

This pattern matches the string this.hello . So when the lexer reaches line 2 of your input, it could either apply the Identifier rule to match this or the IncludeHeaderDirective rule to match this.hello . Since the latter is the longer match, it is chosen as per the maximal munch rule.

Since an IncludedHeaderDirective is not a valid expression, you get the error you do. In order to match the postfixExpression '.' Identifier postfixExpression '.' Identifier rule, this.hello would have had to be tokenized as Identifier, '.', Identifier , but the existence of the IncludedHeaderDirective rule prevents that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM