antlr4 javascript - clash beween jsdoc start and regular expression liternal

Question

I've converted the standard JavaScript Antlr4 grammer/lexer to support JSDOC definitions. I now have to deal with JSDOC one liners such as

/** @var {sometype} id */ The first rule in the lexer is DocStart : '/**' -> pushMode(DOC_MODE);

and the DOC_MODE then parses the JSDOC stuff till it meets the closing */

My problem is that Antlr4 recognizes this as a RegularExpressionLiteral defined as

RegularExpressionLiteral:       '/' RegularExpressionChar+ {IsRegexPossible()}? '/' IdentifierPart*;
fragment RegularExpressionChar
    : ~[\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
   | '[' RegularExpressionClassChar* ']'
    ;

Since /** is not actually a legal regular expression, I suppose I need to finesse the RE definition not to allow two ** - either in general or explicitly after a /** I suppose I could do this in IsRegexPossible() - but this is new ground for me and of course, this happened just before a deadline.... Can anyone please give me a push in the right direction, preferably by a change in the Lexee/Grammer - or if there's no choice then in the IsRegexPossible. I tried this by adding the fragment RegularExpressionCharNoMultiplier (disallow '*' - but it still recognizes the above string as a regular expression literal

RegularExpressionLiteral:       '/' ((RegularExpressionChar RegularExpressionCharNoMultiplier?)
                                   |  (RegularExpressionCharNoMultiplier RegularExpressionChar?))+
                                    {IsRegexPossible()}? '/' IdentifierPart*;
fragment RegularExpressionCharNoMultiplier
    : ~[*\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
   | '[' RegularExpressionClassChar* ']'
    ;

Thanks !

Answer 1

OK - the latest JS Lexer solved it as follows:

RegularExpressionLiteral:       '/' RegularExpressionFirstChar RegularExpressionChar* {this.IsRegexPossible()}? '/' IdentifierPart*;

Where

fragment RegularExpressionFirstChar
    : ~[*\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
    | '[' RegularExpressionClassChar* ']'

Actually - unrelated to my problem I believe that "+" can't be in the First RE char either

antlr4 javascript - clash beween jsdoc start and regular expression liternal

Question

1 answers

solution1
0 2020-01-19 13:06:46

antlr4 javascript - clash beween jsdoc start and regular expression liternal

Question

1 answers

solution1 0 2020-01-19 13:06:46

solution1
0 2020-01-19 13:06:46