简体   繁体   English

将唯一元素添加到由regex确定的Perl数组中

[英]Adding unique elements to a Perl array determined by regex

I'm writing a perl script to analyze error codes and determining whether or not they are unique. 我正在编写一个perl脚本来分析错误代码并确定它们是否是唯一的。 The error is unique depending on what line it's on. 该错误是唯一的,具体取决于它所在的行。 A standard error message may be: 标准错误消息可能是:

RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218, for ..

A lot of these error messages have multiple numbers in the strings that I'm grabbing. 很多这些错误消息在我正在抓取的字符串中有多个数字。 So, what I want to be able to do, is grab the first occurrence of a number after the word "line" and add it to an array ONLY if that value isn't present in the array. 因此,我想要做的是,在“line”之后抓取第一个出现的数字,并且只有在数组中不存在该值时才将其添加到数组中。 Here's what I've got so far: 这是我到目前为止所得到的:

my $path = RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218
$path =~ m/(\d+)/;
print("Error occurs on line $1\n"); 
if(grep(/^$1$/, @RTarray))
{
    print("Not unique.\n");
}
else
{
    push(@RTarray, $1); 
    print("Found a unique error!\n");
}

So, obviously I'm not checking to see if it's after the keyword "line" cause I'm not quite sure how to do that based on how I'm dealing with the regex currently. 所以,显然我没有检查它是否在关键字“line”之后,因为我不太确定如何根据我当前正在处理正则表达式的方式来做到这一点。 Additionally, I don't think I'm adding elements to my array correctly. 另外,我认为我没有正确地向我的数组添加元素。 Help, please! 请帮助!

You should use a hash for that. 你应该使用哈希。 It has the uniqueness built in and you don't even have to check. 它具有内置的独特性,您甚至无需检查。

Here's an example: 这是一个例子:

my %seen;

while (my $line = <$fh>) {

  if ($line =~ m/line (\d+)/) {
    my $ln = $1;
    if ( ! $seen{$ln}++ ) { 
      # this will check first and then increment. If it was encountered before,
      # it will already contain a true value, and thus the block will be skipped.
      # if it has not been encountered before, it will go into the block and...

      # do various operations on the line number
    }
  }

}

Your %seen now contains all lines that have errors, and how many per line: 您现在%seen包含所有有错误的行,以及每行多少行:

print Dumper \%seen:

$VAR1 = {
  10 => 1,
  255 => 5,
  1337 => 1,
}

This tells us that there was one error in line 10 and one in line 1337. Those are unique according to your code. 这告诉我们第10行中有一个错误,第1337行中有一个错误。根据您的代码,这些错误是唯一的。 The five errors in line 255 are not unique because the appeared five times in the log. 第255行中的五个错误不是唯一的,因为在日志中出现了五次。


If you want to get rid of some of them, use delete to delete the whole key/value-pair, or $foo{$1}-- to decrement or something like delete $foo{$1} unless --$foo{$1} to decrement and get rid of it in one line. 如果你想摆脱它们中的一些,使用delete删除整个键/值对,或$foo{$1}--减少或delete $foo{$1} unless --$foo{$1}减少并在一行中摆脱它。


Edit: I've looked at your code. 编辑:我看了你的代码。 Actually, the only thing missing is the regex and the quotes. 实际上,唯一缺少的是正则表达式和引号。 Have you actually tried it? 你真的尝试过吗? It works. 有用。 :) :)

my @RTarray;

while (my $line = <DATA>) {
  $line =~ m/line (\d+)/;
  print("Error occurs on line $1\n"); 
  if( grep { $_ eq $1 } @RTarray ) { # this eq is the same as your regex, just faster
    print("Not unique.\n");
  } else {
    print "Found a unique error in line $1!\n";
    push @RTarray, $1; 
  }
}

__DATA__
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 3, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 44, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for

This will print: 这将打印:

Error occurs on line 218
Found a unique error in line 218!
Error occurs on line 3
Found a unique error in line 3!
Error occurs on line 44
Found a unique error in line 44!
Error occurs on line 218
Not unique.
Error occurs on line 7
Found a unique error in line 7!
Error occurs on line 7
Not unique.

And I think this is correct. 我认为这是正确的。 I had 218 double and 7 triple, and it found them both. 我有218个双倍和7个三倍,它发现它们两个。

I only replaced your string which was missing the quotes with a filehandle loop to test it on multiple lines. 我只用一个文件句柄循环替换了缺少引号的字符串,以便在多行上测试它。 I also fixed your regex that was missing the word line , but that was not even needed for this particular error message. 我还修复了缺少单词行的正则表达式,但这个特定错误消息甚至不需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM