简体   繁体   English

使用ANTLR识别JavaScript文件中的全局变量声明

[英]Using ANTLR to identify global variable declarations in a JavaScript file

I've been using the ANTLR supplied ECMAScript grammar with the objective of identifying JavaScript global variables. 我一直在使用ANTLR提供的ECMAScript语法,目的是识别JavaScript全局变量。 An AST is produced and I'm now wondering what the based way of filtering out the global variable declarations is. 产生了一个AST,我现在想知道筛选全局变量声明的基本方法是什么。

I'm interested in looking for all of the outermost "variableDeclaration" tokens in my AST; 我有兴趣在AST中寻找所有最外面的“ variableDeclaration”标记; the actual how-to-do-this is eluding me though. 实际的操作方法使我难以理解。 Here's my set up code so far: 到目前为止,这是我设置的代码:

String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);

JavaScriptLexer lexer = new JavaScriptLexer(cs);

CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);

JavaScriptParser parser = new JavaScriptParser(tokens);

program_return programReturn = parser.program();

Being new to ANTLR can anyone offer any pointers? 刚接触ANTLR的人可以提供任何指示吗?

I guess you're using this grammar . 我猜你在用这种语法

Although that grammar suggests a proper AST is created, this is not the case. 尽管该语法表明创建了正确的AST,但事实并非如此。 It uses some inline operators to exclude certain tokens from the parse-tree, but it never creates any roots for the tree, resulting in a completely flat parse tree. 它使用一些内联运算符从分析树中排除某些令牌,但是它从不为树创建任何根,从而导致完全平坦的分析树。 From this, you can't get all global vars in a reasonable way. 由此,您无法以合理的方式获得所有全局变量。

You'll need to adjust the grammar slightly: 您需要稍微调整语法:

Add the following under the options { ... } at the top of the grammar file: 在语法文件顶部的options { ... }下添加以下内容:

tokens
{
  VARIABLE;
  FUNCTION;
}

Now replace the following rules: functionDeclaration , functionExpression and variableDeclaration with these: 现在,将以下规则替换为: functionDeclarationfunctionExpressionvariableDeclaration

functionDeclaration
  :  'function' LT* Identifier LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier formalParameterList functionBody)
  ;

functionExpression
  :  'function' LT* Identifier? LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier? formalParameterList functionBody)
  ;

variableDeclaration
  :  Identifier LT* initialiser? 
     -> ^(VARIABLE Identifier initialiser?)
  ;

Now a more suitable tree is generated. 现在,将生成一个更合适的树。 If you now parse the source: 如果现在解析源:

var a = 1; function foo() { var b = 2; } var c = 3;

the following tree is generated: 生成以下树:

替代文字

All you now have to do is iterate over the children of the root of your tree and when you stumble upon a VARIABLE token, you know it's a "global" since all other variables will be under FUNCTION nodes. 您现在要做的就是遍历树根的子代,当偶然发现VARIABLE令牌时,您就知道它是“全局”的,因为所有其他变量都位于FUNCTION节点下。

Here's how to do that: 这样做的方法如下:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
        ANTLRStringStream in = new ANTLRStringStream(source);
        JavaScriptLexer lexer = new JavaScriptLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaScriptParser parser = new JavaScriptParser(tokens);
        JavaScriptParser.program_return returnValue = parser.program();
        CommonTree tree = (CommonTree)returnValue.getTree();
        for(Object o : tree.getChildren()) {
            CommonTree child = (CommonTree)o;
            if(child.getType() == JavaScriptParser.VARIABLE) {
                System.out.println("Found a global var: "+child.getChild(0));
            }
        }
    }
}

which produces the following output: 产生以下输出:

Found a global var: a
Found a global var: c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM