Perl：计数和正则表达式匹配

Question

I got stuck with one problem in my Perl script. 我在Perl脚本中遇到一个问题。 Script generates output which consists of following: 脚本生成包含以下内容的输出：

...
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
...

Second half of my script has to read all those lanes and create table over how many successful logins each user got. 我的脚本的后半部分必须读取所有这些通道，并创建有关每个用户获得多少成功登录的表格。 My solution looks like this (removed header including strict, warnings): 我的解决方案如下所示（已删除标头，包括严格的警告）：

my %SuccessLogins;
my @LoginAttemptsSuccess;
while (my $array = <$fh>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      my $counter = () = $array =~ /Accepted\s+password\s+for\s+(\S+)/gi;
      %SuccessLogins = (
        "User"  => $1,
        "Successful"    => $counter
      );
      push (@LoginAttemptsSuccess, \%SuccessLogins);
    }
}

Problem is that script creates AoH which consists of 1 element and in it I get just 1 row. 问题是脚本创建了由1个元素组成的AoH，并且其中只有1行。 Solution should be a table with all users with corresponding number of successful logins: 解决方案应该是一个包含所有具有相应成功登录次数的用户的表：

User = testuser1
Successful = 6

Username = testuser2
Successful = 2

etc. 等等

I have read a lot of regex examples here on SO but I still don't get logic behind counting matches using regex and storing those results. 我在这里已经阅读了很多正则表达式示例，但是使用正则表达式计数匹配并存储这些结果后，我仍然没有逻辑。

Answer 1

I'd do something like: 我会做类似的事情：

my %SuccessLogins;
while (my $array = <DATA>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      $SuccessLogins{$1}++;
    }
}
say Dumper\%SuccessLogins;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

Output: 输出：

$VAR1 = {
  'testuser4' => 1,
  'testuser2' => 1,
  'testuser1' => 6
};

Answer 2

The 'trick' with regular expressions is that a capturing regular expression creates an array. 正则表达式的“窍门”是捕获正则表达式会创建一个数组。

You can then evaluate that array in a scalar context, to figure out how many 'hits' there were. 然后，您可以在标量上下文中评估该数组，以找出其中有多少个“匹配项”。

So: 所以：

my $string = "fish fish fish fish fish";

my @array = $string =~ m/(fish)/g;

print "@array\n";

print scalar @array;

And that's really all it's doing. 这就是它所做的全部。 This works for multi line stuff too. 这也适用于多行内容。

The reason this isn't working with your script though - is that you're running a while loop that runs on each line. 但是，这不适用于您的脚本的原因-您正在运行在每行上运行的while循环。 So you'll only ever match your pattern once, so your count will only be one. 因此，您只会匹配一次您的模式，因此您的计数将仅为1。 Likewise - your counter - will be of any match of the pattern, so isn't counting user logins like you expect. 同样，您的计数器-将与该模式匹配，因此不会像您期望的那样对用户登录进行计数。

The way you avoid this is either: 避免这种情况的方法是：

continue to work one line at a time and amend code accordingly. 继续一次工作一行，并相应地修改代码。
treat your file handle as a single 'chunk'. 将文件句柄视为单个“块”。

(The latter is a bad idea for really big files). （对于真正的大文件，后者是个坏主意）。 So an example for the first: 所以第一个例子：

use Data::Dumper;

my %count_of;
while ( <DATA> ) {
   my ( $login) = m/Accepted password for (\w+)/;
   print "$login\n"; 
   $count_of{$login}++;
}

print Dumper \%count_of;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

So the second: 所以第二个：

local $/;
my @logins = <DATA> =~ m/Accepted password for (\w+)/g;
print "@logins";

print scalar @logins;

__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2

You'd then reduce @logins much like in the first example. 然后，您将像第一个示例一样减少@logins 。

But in either case - you can 'count' the elements in an array by doing it in a scalar context, which is why it's useful. 但无论哪种情况-您都可以在标量上下文中对数组中的元素进行“计数”，这就是为什么它有用的原因。

You also have $1 , $2 etc. to draw upon when a pattern matches - again, this can be used for extracting a specific user from the list, but I prefer a more direct assignment. 模式匹配时，您还可以使用$1 ， $2等-再次，这可以用于从列表中提取特定用户，但我更喜欢直接分配。

Answer 3

Your script assumes that the regexp will pull multiple values all at the same time for the "testuser" string - it will not 您的脚本假设regexp将同时为“ testuser”字符串提取多个值-不会

The hash construct %SuccessLogins makes a new hash each time it is called in the while loops - which is not what you are aiming to do, I believe 每次在while循环中调用时，哈希构造％SuccessLogins都会创建一个新的哈希-我相信这不是您的目标

I put your test data in the file td1 and then used this one liner 我将您的测试数据放入文件td1中，然后使用此衬纸

perl -ne '@r=/Accepted password for (\w+)/gi; for $item (@r) {$total{$item}++;  } END{  use Data::Dumper; print Dumper(\%total);}' < td1

then I realised that in my test case with one line at a time being read in I might as well do this 然后我意识到，在我的测试用例中，一次只读一行就可以了

perl -ne '/Accepted password for (\w+)/gi;  $total{$1}++;  END{  use Data::Dumper; print Dumper(\%total);}' < td1

Perl：计数和正则表达式匹配

问题描述

3 个解决方案

解决方案1
4 已采纳 2015-06-16 14:51:38

解决方案2
0 2015-06-16 14:56:19

解决方案3
0 2015-06-16 15:02:13

Perl：计数和正则表达式匹配

问题描述

3 个解决方案

解决方案1 4 已采纳 2015-06-16 14:51:38

解决方案2 0 2015-06-16 14:56:19

解决方案3 0 2015-06-16 15:02:13

解决方案1
4 已采纳 2015-06-16 14:51:38

解决方案2
0 2015-06-16 14:56:19

解决方案3
0 2015-06-16 15:02:13