简体   繁体   English

使用ANTLR解析JavaScript正则表达式

[英]Parsing JavaScript regex with ANTLR

I have an ANTLR JavaScript grammar (taken from the Internet), which seem to support everything except for regex literals. 我有一个ANTLR JavaScript语法(取自Internet),它似乎支持除正则表达式文字之外的所有内容。

The problem with a regex literal is that you have two rules, essentially: 正则表达式文字的问题在于你有两个规则,基本上:

multiplicativeExpression
    : unaryExpression (LT!* ('*' | '/' | '%')^ LT!* unaryExpression)*

and

regexLiteral
    : '/' RegexLiteralChar* '/'

where the rule RegexLiteralChar uses different lexer rules than a normal expression (eg. a double quote does not terminate it). 规则RegexLiteralChar使用不同于正常表达式的词法规则(例如,双引号不会终止它)。

This means that I need to, in some way, change some kind of lexer state from my parser. 这意味着我需要以某种方式从我的解析器中改变某种词法分析器状态。 How can I do this? 我怎样才能做到这一点? Is it even possible? 它甚至可能吗?

Looking at the grammar mentioned in the comment by Bart Kiers here , you can see this comment, 看看Bart Kiers 评论中提到的语法,你可以看到这个评论,

The major challenges faced in defining this grammar were: 定义这种语法面临的主要挑战是:

-1- Ambiguity surrounding the DIV sign in relation to the multiplicative expression and the regular expression literal. -1-与乘法表达式和正则表达式文字相关的DIV符号周围的歧义。 This is solved with some lexer driven magic: a gated semantical predicate turns the recognition of regular expressions on or off, based on the value of the RegularExpressionsEnabled property. 这是通过一些词法分析器驱动的魔法来解决的:门控语义谓词根据RegularExpressionsEnabled属性的值打开或关闭正则表达式的识别。 When regular expressions are enabled they take precedence over division expressions. 启用正则表达式时,它们优先于除法表达式。 The decision whether regular expressions are enabled is based on the heuristics that the previous token can be considered as last token of a left-hand-side operand of a division. 是否启用正则表达式的决定是基于前一个令牌可被视为除法的左侧操作数的最后一个令牌的启发式算法。

... ...

The areRegularExpressionsEnabled() function is defined as, areRegularExpressionsEnabled()函数定义为,

private final boolean areRegularExpressionsEnabled()
{
    if (last == null)
    {
        return true;
    }
    switch (last.getType())
    {
    // identifier
        case Identifier:
    // literals
        case NULL:
        case TRUE:
        case FALSE:
        case THIS:
        case OctalIntegerLiteral:
        case DecimalLiteral:
        case HexIntegerLiteral:
        case StringLiteral:
    // member access ending 
        case RBRACK:
    // function call or nested expression ending
        case RPAREN:
            return false;
    // otherwise OK
        default:
            return true;
    }
}

And then the function is used in the RegularExpressionLiteral expression, 然后该函数用于RegularExpressionLiteral表达式,

RegularExpressionLiteral
    : { areRegularExpressionsEnabled() }?=> DIV RegularExpressionFirstChar RegularExpressionChar* DIV IdentifierPart*
    ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM