简体   繁体   English

如何使用Perl识别对Java类的引用?

[英]How can I identify references to Java classes using Perl?

I'm writing a Perl script and I've come to a point where I need to parse a Java source file line by line checking for references to a fully qualified Java class name. 我正在写一个Perl脚本,我已经到了一个点,我需要逐行解析Java源文件,检查对完全限定的Java类名的引用。 I know the class I'm looking for up front; 我知道我正在寻找的课程; also the fully qualified name of the source file that is being searched (based on its path). 也是正在搜索的源文件的完全限定名称(基于其路径)。

For example find all valid references to foo.bar.Baz inside the com/bob/is/YourUncle.java file. 例如,在com / bob / is / YourUncle.java文件中找到对foo.bar.Baz的所有有效引用。

At this moment the cases I can think of that it needs to account for are: 此时我能想到的需要考虑的案例是:

  1. The file being parsed is in the same package as the search class. 正在解析的文件与搜索类位于同一个包中。

    find foo.bar.Baz references in foo/bar/Boing.java 在foo / bar / Boing.java中找到foo.bar.Baz引用

  2. It should ignore comments. 它应该忽略评论。

     // this is a comment saying this method returns a foo.bar.Baz or Baz instance // it shouldn't count /* a multiline comment as well this shouldn't count if I put foo.bar.Baz or Baz in here either */ 
  3. In-line fully qualified references. 在线完全限定参考。

     foo.bar.Baz fb = new foo.bar.Baz(); 
  4. References based off an import statement. 引用基于import语句。

     import foo.bar.Baz; ... Baz b = new Baz(); 

What would be the most efficient way to do this in Perl 5.8? 在Perl 5.8中最有效的方法是什么? Some fancy regex perhaps? 有些花哨的正则表达式可能吗?

open F, $File::Find::name or die;
# these three things are already known
# $classToFind    looking for references of this class
# $pkgToFind      the package of the class you're finding references of
# $currentPkg     package name of the file being parsed
while(<F>){
  # ... do work here   
}
close F;
# the results are availble here in some form

A Regex is probably the best solution for this, although I did find the following module in CPAN that you might be able to use 正则表达式可能是最好的解决方案,尽管我确实在CPAN中找到了您可以使用的以下模块

  • Java::JVM::Classfile - Parses compiled class files and returns info about them. Java :: JVM :: Classfile - 解析已编译的类文件并返回有关它们的信息。 You would have to compile the files before you could use this. 您必须先编译文件才能使用它。

Also, remember that it can be tricky to catch all possible variants of a multi-line comment with a regex. 此外,请记住,使用正则表达式捕获多行注释的所有可能变体可能很棘手。

You also need to skip quoted strings (you can't even skip comments correctly if you don't also deal with quoted strings). 您还需要跳过引用的字符串(如果您不处理引用的字符串,则甚至无法正确跳过注释)。

I'd probably write a fairly simple, efficient, and incomplete tokenizer very similar to the one I wrote in node 566467 . 我可能会编写一个相当简单,高效且不完整的标记化器,与我在节点566467中编写的标记器非常相似。

Based on that code I'd probably just dig through the non-comment/non-string chunks looking for \\bimport\\b and \\b\\Q$toFind\\E\\b matches. 基于该代码,我可能只是挖掘非注释/非字符串块寻找\\bimport\\b\\b\\Q$toFind\\E\\b匹配。 Perhaps similar to: 也许类似于:

if( m[
        \G
        (?:
            [^'"/]+
          | /(?![/*])
        )+
    ]xgc
) {
    my $code = substr( $_, $-[0], $+[0] - $-[0] );
    my $imported = 0;
    while( $code =~ /\b(import\s+)?\Q$package\E\b/g ) {
        if( $1 ) {
            ... # Found importing of package
            while( $code =~ /\b\Q$class\E\b/g ) {
                ... # Found mention of imported class
            }
            last;
        }
        ... # Found a package reference
    }
} elsif( m[ \G ' (?: [^'\\]+ | \\. )* ' ]xgc
    ||   m[ \G " (?: [^"\\]+ | \\. )* " ]xgc
) {
    # skip quoted strings
} elsif(  m[\G//.*]g­c  ) {
    # skip C++ comments

This is really just a straight grep for Baz (or for /(foo.bar.| )Baz/ if you're concerned about false positives from some.other.Baz), but ignoring comments, isn't it? 这真的只是Baz的直接grep(或/(foo.bar。|)Baz /如果你担心some.other.Baz的误报),但忽略评论,不是吗?

If so, I'd knock together a state engine to track whether you're in a multiline comment or not. 如果是这样的话,我会把一个状态引擎组合起来跟踪你是否在进行多行注释。 The regexes needed aren't anything special. 所需的正则表达并不特别。 Something along the lines of ( untested code ): 未经测试的代码 ):

my $in_comment;
my %matches;
my $line_num = 0;
my $full_target = 'foo.bar.Baz';
my $short_target = (split /\./, $full_target)[-1];  # segment after last . (Baz)

while (my $line = <F>) {
    $line_num++;
    if ($in_comment) {
        next unless $line =~ m|\*/|;  # ignore line unless it ends the comment
        $line =~ s|.*\*/||;           # delete everything prior to end of comment
    } elsif ($line =~ m|/\*|) {
        if ($line =~ m|\*/|) {        # catch /* and */ on same line
            $line =~ s|/\*.*\*/||;
        } else {
            $in_comment = 1;
            $line =~ s|/\*.*||;       # clear from start of comment to end of line
        }
    }

    $line =~ s/\\\\.*//;   # remove single-line comments
    $matches{$line_num} = $line if $line =~ /$full_target| $short_target/;
}

for my $key (sort keys %matches) {
    print $key, ': ', $matches{$key}, "\n";
}

It's not perfect and the in/out of comment state can be messed up by nested multiline comments or if there are multiple multiline comments on the same line, but that's probably good enough for most real-world cases. 它并不完美,注释状态的输入/输出可能会被嵌套的多行注释搞砸,或者如果同一行上有多个多行注释,但这对于大多数真实案例来说可能已经足够了。

To do it without the state engine, you'd need to slurp into a single string, delete the / ... / comments, and split it back into separate lines, and grep those for non-//-comment hits. 要在没有状态引擎的情况下执行此操作,您需要插入单个字符串,删除/ ... / comments,然后将其拆分为单独的行,并为非//注释命中grep。 But you wouldn't be able to include line numbers in the output that way. 但是你不能以这种方式在输出中包含行号。

This is what I came up with that works for all the different cases I've thrown at it. 这就是我想出来的,它适用于我抛出的所有不同情况。 I'm still a Perl noob and its probably not the fastest thing in the world but it should work for what I need. 我仍然是一个Perl noob,它可能不是世界上最快的东西,但它应该适用于我需要的东西。 Thanks for all the answers they helped me look at it in different ways. 感谢所有答案,他们帮助我以不同的方式看待它。

  my $className = 'Baz';
  my $searchPkg = 'foo.bar';
  my @potentialRefs, my @confirmedRefs;
  my $samePkg = 0;
  my $imported = 0;
  my $currentPkg = 'com.bob';
  $currentPkg =~ s/\//\./g;
  if($currentPkg eq $searchPkg){
    $samePkg = 1;  
  }
  my $inMultiLineComment = 0;
  open F, $_ or die;
  my $lineNum = 0;
  while(<F>){
    $lineNum++;
    if($inMultiLineComment){
      if(m|^.*?\*/|){
        s|^.*?\*/||; #get rid of the closing part of the multiline comment we're in
        $inMultiLineComment = 0;
      }else{
        next;
      }
    }
    if(length($_) > 0){
      s|"([^"\\]*(\\.[^"\\]*)*)"||g; #remove strings first since java cannot have multiline string literals
      s|/\*.*?\*/||g;  #remove any multiline comments that start and end on the same line
      s|//.*$||;  #remove the // comments from what's left
      if (m|/\*.*$|){
        $inMultiLineComment = 1 ;#now if you have any occurence of /* then then at least some of the next line is in the multiline comment
        s|/\*.*$||g;
      }
    }else{
      next; #no sense continuing to process a blank string
    }

    if (/^\s*(import )?($searchPkg)?(.*)?\b$className\b/){
      if($imported || $samePkg){
        push(@confirmedRefs, $lineNum);
      }else {
        push(@potentialRefs, $lineNum);
      }
      if($1){
        $imported = 1;
      } elsif($2){
        push(@confirmedRefs, $lineNum);
      }
    }
  }
  close F;      
  if($imported){
    push(@confirmedRefs,@potentialRefs);
  }

  for (@confirmedRefs){
    print "$_\n";
  }

如果你有足够的冒险精神,你可以看看Parse :: RecDescent

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM