使用ANTLR解析JavaScript正则表达式

Question

I have an ANTLR JavaScript grammar (taken from the Internet), which seem to support everything except for regex literals. 我有一个ANTLR JavaScript语法（取自Internet），它似乎支持除正则表达式文字之外的所有内容。

The problem with a regex literal is that you have two rules, essentially: 正则表达式文字的问题在于你有两个规则，基本上：

multiplicativeExpression
    : unaryExpression (LT!* ('*' | '/' | '%')^ LT!* unaryExpression)*

and 和

regexLiteral
    : '/' RegexLiteralChar* '/'

where the rule RegexLiteralChar uses different lexer rules than a normal expression (eg. a double quote does not terminate it). 规则RegexLiteralChar使用不同于正常表达式的词法规则（例如，双引号不会终止它）。

This means that I need to, in some way, change some kind of lexer state from my parser. 这意味着我需要以某种方式从我的解析器中改变某种词法分析器状态。 How can I do this? 我怎样才能做到这一点？ Is it even possible? 它甚至可能吗？

Answer 1

Looking at the grammar mentioned in the comment by Bart Kiers here , you can see this comment, 看看Bart Kiers 在评论中提到的语法，你可以看到这个评论，

The major challenges faced in defining this grammar were: 定义这种语法面临的主要挑战是：

-1- Ambiguity surrounding the DIV sign in relation to the multiplicative expression and the regular expression literal. -1-与乘法表达式和正则表达式文字相关的DIV符号周围的歧义。 This is solved with some lexer driven magic: a gated semantical predicate turns the recognition of regular expressions on or off, based on the value of the RegularExpressionsEnabled property. 这是通过一些词法分析器驱动的魔法来解决的：门控语义谓词根据RegularExpressionsEnabled属性的值打开或关闭正则表达式的识别。 When regular expressions are enabled they take precedence over division expressions. 启用正则表达式时，它们优先于除法表达式。 The decision whether regular expressions are enabled is based on the heuristics that the previous token can be considered as last token of a left-hand-side operand of a division. 是否启用正则表达式的决定是基于前一个令牌可被视为除法的左侧操作数的最后一个令牌的启发式算法。

... ...

The areRegularExpressionsEnabled() function is defined as, areRegularExpressionsEnabled（）函数定义为，

private final boolean areRegularExpressionsEnabled()
{
    if (last == null)
    {
        return true;
    }
    switch (last.getType())
    {
    // identifier
        case Identifier:
    // literals
        case NULL:
        case TRUE:
        case FALSE:
        case THIS:
        case OctalIntegerLiteral:
        case DecimalLiteral:
        case HexIntegerLiteral:
        case StringLiteral:
    // member access ending 
        case RBRACK:
    // function call or nested expression ending
        case RPAREN:
            return false;
    // otherwise OK
        default:
            return true;
    }
}

And then the function is used in the RegularExpressionLiteral expression, 然后该函数用于RegularExpressionLiteral表达式，

RegularExpressionLiteral
    : { areRegularExpressionsEnabled() }?=> DIV RegularExpressionFirstChar RegularExpressionChar* DIV IdentifierPart*
    ;

使用ANTLR解析JavaScript正则表达式

问题描述

1 个解决方案

解决方案1
5 已采纳 2012-09-03 05:28:34

使用ANTLR解析JavaScript正则表达式

问题描述

1 个解决方案

解决方案1 5 已采纳 2012-09-03 05:28:34

解决方案1
5 已采纳 2012-09-03 05:28:34