简体   繁体   English

如何在JavaCC中实现JavaScript / ECMAScript“此处没有LineTerminator”规则?

[英]How to implement JavaScript/ECMAScript “no LineTerminator here” rule in JavaCC?

I continue working on my JavaCC grammar for ECMAScript 5.1 . 我将继续为ECMAScript 5.1编写JavaCC语法 It actually goes quite well, I think I've covered most of the expressions now. 它实际上运行得很好,我想我现在已经涵盖了大多数表达式。

I have now two questions, both of them are related to the automatic semicolon insertion (§7.9.1). 我现在有两个问题,它们都与自动分号插入有关(第7.9.1节)。 This is one of them. 这就是其中之一。

The specification defines the following production: 该规范定义了以下产品:

PostfixExpression :
    LeftHandSideExpression
    LeftHandSideExpression [no LineTerminator here] ++
    LeftHandSideExpression [no LineTerminator here] --

How can I implement a reliable "no LineTerminator here" check? 如何实现可靠的“此处没有LineTerminator”检查?

For the record my LINE_TERMINATOR is at the moment something like: 为了记录,我的LINE_TERMINATOR目前类似:

SPECIAL_TOKEN :
{
    <LINE_TERMINATOR: <LF> | <CR> | <LS> | <PS> >
|   < #LF: "\n" > /* Line Feed */
|   < #CR: "\r" > /* Carriage Return */
|   < #LS: "\u2028" > /* Line separator */
|   < #PS: "\u2029" > /* Paragraph separator */
}

I have read about lexical states , but I am not sure if this is a right direction. 我已经阅读了有关词法状态的信息 ,但是我不确定这是否是正确的方向。 I've checked a few other JavaScript grammars I have found, but did not find any similar rules there. 我检查了我发现的其他一些JavaScript语法,但没有在其中找到任何类似的规则。 (I actually feel myself a total cargo culter when I try to overtake something from these grammars.) (当我尝试从这些语法中超越某些东西时,我实际上感觉自己是一个全面的货神 。)

I'd be grateful for a pointer, a hint or just a keyword for the right search direction. 我会为正确的搜索方向提供一个指针,一个提示或只是一个关键字。

I think for the "restricted productions" you can do this 我认为对于“限产”,您可以这样做

void PostfixExpression() : 
{} {
     LeftHandSideExpression() 
     (
         LOOKAHEAD( "++", {getToken(0).beginLine == getToken(1).beginLine})
         "++"
     |
         LOOKAHEAD( "--", {getToken(0).beginLine == getToken(1).beginLine})
         "--"
     |
         {}
     )
}

Update As Gunther pointed out, my original solution was not correct due to this paragraph in 7.4 of the spec: 更新正如Gunther指出的,由于规范7.4中的这一段,我的原始解决方案是不正确的:

Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar. 注释的行为类似于空格,并且被丢弃,除了注释之外,如果MultiLineComment包含行终止符,则出于语法语法分析的目的,整个注释都被视为LineTerminator。

I'm posting a correction but leaving my original solution at the end of the question. 我正在发布更正,但问题的末尾保留了原始解决方案。

Corrected solution 更正的解决方案

The core idea, as proposed by Theodore Norvell is to use semantic lookahead. Theodore Norvell提出的核心思想是使用语义超前。 However I have decided to implement a more safe check: 但是我决定实施更安全的检查:

public static boolean precededByLineTerminator(Token token) {
    for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
        if (specialToken.kind == EcmaScriptParserConstants.LINE_TERMINATOR) {
            return true;
        } else if (specialToken.kind == EcmaScriptParserConstants.MULTI_LINE_COMMENT) {
            final String image = specialToken.image;
            if (StringUtils.containsAny(image, (char)0x000A, (char)0x000D, (char)0x2028,
                    (char)0x2029)) {
                return true;
            }
        }
    }
    return false;
}

And the grammar is: 语法是:

expression = LeftHandSideExpression()
(
    LOOKAHEAD ( <INCR>, { !TokenUtils.precededByLineTerminator(getToken(1))} )
    <INCR>
    {
        return expression.postIncr();
    }
|   LOOKAHEAD ( <DECR>, { !TokenUtils.precededByLineTerminator(getToken(1))} )
    <DECR>
    {
        return expression.postDecr();
    }
) ?
{
    return expression;
}

So the ++ or -- are considered here iff they are not preceded by a line terminator. 因此,此处考虑++--前提是它们前面没有行终止符。


Original solution 原始解决方案

This not is how I finally solved it. 不是我最终解决它的方式。

The core idea, as proposed by Theodore Norvell is to use semantic lookahead. Theodore Norvell提出的核心思想是使用语义超前。 However I have decided to implement a more safe check: 但是我决定实施更安全的检查:

public static boolean precededBySpecialTokenOfKind(Token token, int kind) {
    for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
        if (specialToken.kind == kind) {
            return true;
        }
    }
    return false;
}

And the grammar is: 语法是:

expression = LeftHandSideExpression()
(
    LOOKAHEAD ( <INCR>, { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
    <INCR>
    {
        return expression.postIncr();
    }
|   LOOKAHEAD ( <DECR>, { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
    <DECR>
    {
        return expression.postDecr();
    }
) ?
{
    return expression;
}

So the ++ or -- are considered here iff they are not preceded by a line terminator. 因此,此处考虑++--前提是它们前面没有行终止符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM