简体   繁体   中英

Adding unique elements to a Perl array determined by regex

I'm writing a perl script to analyze error codes and determining whether or not they are unique. The error is unique depending on what line it's on. A standard error message may be:

RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218, for ..

A lot of these error messages have multiple numbers in the strings that I'm grabbing. So, what I want to be able to do, is grab the first occurrence of a number after the word "line" and add it to an array ONLY if that value isn't present in the array. Here's what I've got so far:

my $path = RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218
$path =~ m/(\d+)/;
print("Error occurs on line $1\n"); 
if(grep(/^$1$/, @RTarray))
{
    print("Not unique.\n");
}
else
{
    push(@RTarray, $1); 
    print("Found a unique error!\n");
}

So, obviously I'm not checking to see if it's after the keyword "line" cause I'm not quite sure how to do that based on how I'm dealing with the regex currently. Additionally, I don't think I'm adding elements to my array correctly. Help, please!

You should use a hash for that. It has the uniqueness built in and you don't even have to check.

Here's an example:

my %seen;

while (my $line = <$fh>) {

  if ($line =~ m/line (\d+)/) {
    my $ln = $1;
    if ( ! $seen{$ln}++ ) { 
      # this will check first and then increment. If it was encountered before,
      # it will already contain a true value, and thus the block will be skipped.
      # if it has not been encountered before, it will go into the block and...

      # do various operations on the line number
    }
  }

}

Your %seen now contains all lines that have errors, and how many per line:

print Dumper \%seen:

$VAR1 = {
  10 => 1,
  255 => 5,
  1337 => 1,
}

This tells us that there was one error in line 10 and one in line 1337. Those are unique according to your code. The five errors in line 255 are not unique because the appeared five times in the log.


If you want to get rid of some of them, use delete to delete the whole key/value-pair, or $foo{$1}-- to decrement or something like delete $foo{$1} unless --$foo{$1} to decrement and get rid of it in one line.


Edit: I've looked at your code. Actually, the only thing missing is the regex and the quotes. Have you actually tried it? It works. :)

my @RTarray;

while (my $line = <DATA>) {
  $line =~ m/line (\d+)/;
  print("Error occurs on line $1\n"); 
  if( grep { $_ eq $1 } @RTarray ) { # this eq is the same as your regex, just faster
    print("Not unique.\n");
  } else {
    print "Found a unique error in line $1!\n";
    push @RTarray, $1; 
  }
}

__DATA__
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 3, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 44, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for

This will print:

Error occurs on line 218
Found a unique error in line 218!
Error occurs on line 3
Found a unique error in line 3!
Error occurs on line 44
Found a unique error in line 44!
Error occurs on line 218
Not unique.
Error occurs on line 7
Found a unique error in line 7!
Error occurs on line 7
Not unique.

And I think this is correct. I had 218 double and 7 triple, and it found them both.

I only replaced your string which was missing the quotes with a filehandle loop to test it on multiple lines. I also fixed your regex that was missing the word line , but that was not even needed for this particular error message.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM