简体   繁体   English

我想使用ANTLR4从Java源文件中提取所有方法名称和变量名称

[英]I want to extract all method names and variable names from a java source file using ANTLR4

Basically I want to extract all variable names, irrespective of their scope. 基本上,我想提取所有变量名,而不管它们的范围如何。 And all function/method names inside the source code. 以及源代码中的所有函数/方法名称。

For the given input, 对于给定的输入,

     class temp{
         int a;

    public static void main(String args[]){
    int b = 0;
temp ob = new temp();
temp.printob();

    }

void printob(){
System.out.print("-");
}
        }

The output should be something like: 输出应该是这样的:

variables = {"a","b","ob"} 变量= {“ a”,“ b”,“ ob”}

methods = {"main","printob"} 方法= {“ main”,“ printob”}

One way to achieve this is making small changes to the grammar specification of the said language in this case Java. 实现此目的的一种方法是在这种情况下(即Java)对所述语言的语法规范进行小的更改。

What we can do is create a global arraylist and insert all identifiers into it. 我们可以做的是创建一个全局数组列表,并将所有标识符插入其中。

In the grammar, for each rule wherever Identifier is used for example, 在语法中,例如对于每个规则,无论在何处使用标识符,

methodcall : return_types IDENTIFIER LEFTPAREN params RIGHTPAREN;

we will make the following changes(where our global arraylist is called all_identifiers) 我们将进行以下更改(其中我们的全局数组列表称为all_identifiers)

methodcall : return_types IDENTIFIER LEFTPAREN params RIGHTPAREN{

 all_identifiers.add($IDENTIFIER);

}; 

similarly adding the above code, to each rule having Identifier we will be able to extract all the methods and variable names (if you want them in different arraylists create two arraylist one for method calls and one for variable declaration) 类似地,将上述代码添加到具有标识符的每个规则中,我们将能够提取所有方法和变量名(如果您希望它们在不同的arraylist中创建两个arraylist,一个用于方法调用,一个用于变量声明)

Additional Note: 附加说明:

When I originally posted this question I wanted to find a way to change all method call names and all variable names to some pre-defined names to normalise the code. 当我最初发布此问题时,我想找到一种方法来将所有方法调用名和所有变量名更改为一些预定义的名称,以使代码规范化。 For example in the code I wanted to change int a,b,c; 例如,在代码中,我想更改int a,b,c; to something like int varbl,varbl,varbl; 像int varbl,varbl,varbl; and similarly for methods I wanted to change all method names to mthd. 同样,对于方法,我想将所有方法名称都更改为mthd。

So the best way I found to achieve this was, 1. Identify all rules, where we want to change some Identifier 2. In each of the rule add similar code section (this step needs to be done as token object by it self is not editable so we type cast it to CommonToken object which gives us ability to set the test using setText() method.) 因此,我发现实现此目标的最佳方法是:1.识别所有规则,在其中我们要更改一些标识符。2.在每个规则中添加类似的代码部分(此步骤本身不需要作为令牌对象来完成)可编辑,因此我们将其类型转换为CommonToken对象,这使我们能够使用setText()方法设置测试。)

{
 CommonToken tkn_tmp = $IDENTIFIER;
 tkn_tmp.setText("varbl"); 
 // or if it's a method rule 
 //tkn_tmp.setText("mthd");
 }

3. Now all the tokens will be changed from their original form to the value we are setting in. 3.现在,所有令牌都将从其原始形式更改为我们设置的值。

  1. After this you need to parse the input code through the grammar, and the parse tree along with parser text will be updated(along with token start and token end pointers) 之后,您需要通过语法来解析输入代码,并且解析树以及解析器文本将被更新(以及令牌开始和令牌结束指针)

There is another way to achieve the same, which is by iterating through all the tokens sequentially and adding adding token in an arrylist, and checking for token type. 还有另一种方法可以实现此目的,方法是依次迭代所有令牌,并在添加列表中添加令牌,然后检查令牌类型。 If the token type is Identifier change the text to whatever you want to, and then as usual append it to the array list. 如果令牌类型为“标识符”,则将文本更改为所需的内容,然后像往常一样将其追加到数组列表中。

For segregation of method names and variable names you might need to change the grammar so that, variable Identifier and method Identifier are differentiable. 为了分隔方法名称和变量名称,您可能需要更改语法,以便变量标识符和方法标识符是可区分的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM