简体   繁体   中英

antlr4 javascript - clash beween jsdoc start and regular expression liternal

I've converted the standard JavaScript Antlr4 grammer/lexer to support JSDOC definitions. I now have to deal with JSDOC one liners such as

/** @var {sometype} id */ The first rule in the lexer is DocStart : '/**' -> pushMode(DOC_MODE);

and the DOC_MODE then parses the JSDOC stuff till it meets the closing */

My problem is that Antlr4 recognizes this as a RegularExpressionLiteral defined as

RegularExpressionLiteral:       '/' RegularExpressionChar+ {IsRegexPossible()}? '/' IdentifierPart*;
fragment RegularExpressionChar
    : ~[\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
   | '[' RegularExpressionClassChar* ']'
    ;

Since /** is not actually a legal regular expression, I suppose I need to finesse the RE definition not to allow two ** - either in general or explicitly after a /** I suppose I could do this in IsRegexPossible() - but this is new ground for me and of course, this happened just before a deadline.... Can anyone please give me a push in the right direction, preferably by a change in the Lexee/Grammer - or if there's no choice then in the IsRegexPossible. I tried this by adding the fragment RegularExpressionCharNoMultiplier (disallow '*' - but it still recognizes the above string as a regular expression literal

RegularExpressionLiteral:       '/' ((RegularExpressionChar RegularExpressionCharNoMultiplier?)
                                   |  (RegularExpressionCharNoMultiplier RegularExpressionChar?))+
                                    {IsRegexPossible()}? '/' IdentifierPart*;
fragment RegularExpressionCharNoMultiplier
    : ~[*\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
   | '[' RegularExpressionClassChar* ']'
    ;

Thanks !

OK - the latest JS Lexer solved it as follows:

RegularExpressionLiteral:       '/' RegularExpressionFirstChar RegularExpressionChar* {this.IsRegexPossible()}? '/' IdentifierPart*;

Where

fragment RegularExpressionFirstChar
    : ~[*\r\n\u2028\u2029\\/[]
    | RegularExpressionBackslashSequence
    | '[' RegularExpressionClassChar* ']'

Actually - unrelated to my problem I believe that "+" can't be in the First RE char either

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM