简体   繁体   English

如何在JavaCC中实现JavaScript自动分号插入?

[英]How to implement JavaScript automatic semicolon insertion in JavaCC?

I am finishing my ECMAScript 5.1/JavaScript grammar for JavaCC . 我正在完成JavaCC的 ECMAScript 5.1 / JavaScript 语法 I've done all the tokens and productions according to the specification. 我已经按照规范完成了所有标记和制作。

Now I'm facing a big question which I don't know how to solve. 现在,我面临着一个大问题,我不知道该如何解决。

JavaScript has this nice feature of the automatic semicolon insertion: JavaScript具有自动分号插入的这一出色功能:

What are the rules for JavaScript's automatic semicolon insertion (ASI)? JavaScript自动分号(ASI)的规则是什么?

To quote the specifications , the rules are: 引用规范 ,规则是:

There are three basic rules of semicolon insertion: 分号插入有三个基本规则:

  1. When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true: 当从左到右解析程序时,遇到任何语法产生都不允许的令牌(称为冒犯令牌)时,如果以下一项或多项操作,则在冒犯令牌之前会自动插入分号条件为真:

    • The offending token is separated from the previous token by at least one LineTerminator. 至少有一个LineTerminator将有问题的令牌与前一个令牌分开。
    • The offending token is } . 令人讨厌的令牌是}
  2. When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream. 当从左向右解析程序时,遇到令牌输入流的末尾,并且解析器无法将输入令牌流作为单个完整的ECMAScript程序进行解析,则分号将自动插入到末尾。输入流。

  3. When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation [no LineTerminator here] within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator , then a semicolon is automatically inserted before the restricted token. 当从左向右解析程序时,遇到某种语法的某种生产所允许的令牌,但是该生产是受限制的生产,并且该令牌将成为紧随注解之后的终端或非终端的第一个令牌在受限产品中[no LineTerminator here] (因此,此类令牌称为受限令牌),并且受限令牌与至少一个LineTerminator与前一个令牌分开,然后在受限令牌之前自动插入分号。

However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 12.6.3 ). 但是,上述规则还有一个额外的优先条件:如果分号随后将被解析为空语句,或者如果该分号成为for语句的标头中的两个分号之一,则永远不会自动插入分号。 12.6.3 )。

How could I implement this with JavaCC? 如何用JavaCC来实现呢?

The closes thing to an answer I've found so far is this grammar from Dojo toolkit which has a JAVACODE part called insertSemiColon dedicated to the task. 到目前为止,我发现的答案JAVACODE是Dojo工具箱中的这种语法工具箱具有专门用于该任务的JAVACODE部分,称为insertSemiColon But I don't see that this method is called anywhere (neither in the grammar nor in the whole jslinker code). 但是我看不到在任何地方都可以调用此方法(无论是在语法上还是在整个jslinker代码中)。

How could I approach this problem with JavaCC? 如何用JavaCC解决这个问题?

See also this question: 另请参阅以下问题:

javascript grammar and automatic semocolon insertion javascript语法和semocolon自动插入

(No answer there.) (那里没有答案。)

A question from the comments: 评论中的一个问题:

Is it correct to say that semicolons need only be inserted where semicolons are syntactically allowed? 说分号只需要在语法上允许分号的地方插入是正确的吗?

I think it would be correct to say that semicolons need only be inserted where semicolons are syntactically required . 我认为说分号仅在语法上需要分号的地方插入是正确的。

The relevant part here is §7.9: 这里的相关部分是第7.9节:

7.9 Automatic Semicolon Insertion 7.9自动分号插入

Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons. 某些ECMAScript语句(空语句,变量语句,表达式语句,do-while语句,continue语句,break语句,return语句和throw语句)必须以分号终止。 Such semicolons may always appear explicitly in the source text. 这样的分号可能总是显式地出现在源文本中。 For convenience, however, such semicolons may be omitted from the source text in certain situations. 但是,为方便起见,在某些情况下可以从源文本中省略此类分号。 These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations. 通过说在这些情况下将分号自动插入到源代码令牌流中来描述这些情况。

Let's take the return statement for instance: 让我们以return语句为例:

ReturnStatement :
    return ;
    return [no LineTerminator here] Expression ;

So (from my understanding) syntactically the semicolon is required , not just allowed (as in your question). 因此(从我的理解)从语法上说 ,分号是必需的 ,而不仅仅是被允许的 (如您的问题)。

The 3 rules for semicolon insertion can be found in section 7.9.1 of the ECMAScript 5.1 standard 分号插入的3条规则可以在ECMAScript 5.1标准的7.9.1节中找到

I think rules 1 and 2 from the standard can be handled with semantic lookahead. 我认为标准中的规则1和2可以通过语义先行处理。

void PossiblyInsertedSemicolon() 
{}
{
    LOOKAHEAD( {semicolonNeedsInserting()} ) {}
|
    ";"
}

So when does a semicolon need inserting? 那么什么时候需要插入分号呢? When one of these is true 当其中之一为真时

  • When the next token is not a semicolon and is on another line ( getToken(1).kind != SEMICOLON && getToken(0).endLine < getToken(1).beginLine ) 当下一个标记不是分号并且在另一行上时( getToken(1).kind != SEMICOLON && getToken(0).endLine < getToken(1).beginLine
  • When the next token is a right brace. 当下一个标记是右括号时。
  • When the next token is EOF 当下一个令牌是EOF时

So we need 所以我们需要

boolean semicolonNeedsInserting() {
    return (`getToken(1).kind != SEMICOLON && getToken(0).endLine < getToken(1).beginLine`) 
    || getToken(1).kind == RBRACE
    || getToken(1).kind == EOF ;
}

That takes care of rules 1 and 2 of the standard. 这照顾了标准的规则1和2。

For rule 3 (restricted productions) , as mentioned in my answer to this question , you could do the following 对于我对这个问题的回答中提到的规则3(限制生产),您可以执行以下操作

void returnStatement()
{}
{
    "return"
    [   // Parse an expression unless either the next token is a ";", "}" or EOF, or the next token is on another line.
        LOOKAHEAD( {   getToken(1).kind != SEMICOLON
                    && getToken(1).kind != RBRACE
                    && getToken(1).kind != EOF
                    && getToken(0).endLine == getToken(1).beginLine} )
        Expression()
    ]
    PossiblyInsertedSemicolon() 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM