Perl：計數和正則表達式匹配

Question

我在Perl腳本中遇到一個問題。 腳本生成包含以下內容的輸出：

...
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
...

我的腳本的后半部分必須讀取所有這些通道，並創建有關每個用戶獲得多少成功登錄的表格。 我的解決方案如下所示（已刪除標頭，包括嚴格的警告）：

my %SuccessLogins;
my @LoginAttemptsSuccess;
while (my $array = <$fh>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      my $counter = () = $array =~ /Accepted\s+password\s+for\s+(\S+)/gi;
      %SuccessLogins = (
        "User"  => $1,
        "Successful"    => $counter
      );
      push (@LoginAttemptsSuccess, \%SuccessLogins);
    }
}

問題是腳本創建了由1個元素組成的AoH，並且其中只有1行。 解決方案應該是一個包含所有具有相應成功登錄次數的用戶的表：

User = testuser1
Successful = 6

Username = testuser2
Successful = 2

等等

我在這里已經閱讀了很多正則表達式示例，但是使用正則表達式計數匹配並存儲這些結果后，我仍然沒有邏輯。

Answer 1

我會做類似的事情：

my %SuccessLogins;
while (my $array = <DATA>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      $SuccessLogins{$1}++;
    }
}
say Dumper\%SuccessLogins;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

輸出：

$VAR1 = {
  'testuser4' => 1,
  'testuser2' => 1,
  'testuser1' => 6
};

Answer 2

正則表達式的“竅門”是捕獲正則表達式會創建一個數組。

然后，您可以在標量上下文中評估該數組，以找出其中有多少個“匹配項”。

所以：

my $string = "fish fish fish fish fish";

my @array = $string =~ m/(fish)/g;

print "@array\n";

print scalar @array;

這就是它所做的全部。 這也適用於多行內容。

但是，這不適用於您的腳本的原因-您正在運行在每行上運行的while循環。 因此，您只會匹配一次您的模式，因此您的計數將僅為1。 同樣，您的計數器-將與該模式匹配，因此不會像您期望的那樣對用戶登錄進行計數。

避免這種情況的方法是：

繼續一次工作一行，並相應地修改代碼。
將文件句柄視為單個“塊”。

（對於真正的大文件，后者是個壞主意）。 所以第一個例子：

use Data::Dumper;

my %count_of;
while ( <DATA> ) {
   my ( $login) = m/Accepted password for (\w+)/;
   print "$login\n"; 
   $count_of{$login}++;
}

print Dumper \%count_of;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

所以第二個：

local $/;
my @logins = <DATA> =~ m/Accepted password for (\w+)/g;
print "@logins";

print scalar @logins;

__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2

然后，您將像第一個示例一樣減少@logins 。

但無論哪種情況-您都可以在標量上下文中對數組中的元素進行“計數”，這就是為什么它有用的原因。

模式匹配時，您還可以使用$1 ， $2等-再次，這可以用於從列表中提取特定用戶，但我更喜歡直接分配。

Answer 3

您的腳本假設regexp將同時為“ testuser”字符串提取多個值-不會

每次在while循環中調用時，哈希構造％SuccessLogins都會創建一個新的哈希-我相信這不是您的目標

我將您的測試數據放入文件td1中，然后使用此襯紙

perl -ne '@r=/Accepted password for (\w+)/gi; for $item (@r) {$total{$item}++;  } END{  use Data::Dumper; print Dumper(\%total);}' < td1

然后我意識到，在我的測試用例中，一次只讀一行就可以了

perl -ne '/Accepted password for (\w+)/gi;  $total{$1}++;  END{  use Data::Dumper; print Dumper(\%total);}' < td1

Perl：計數和正則表達式匹配

問題描述

3 個解決方案

解決方案1
4 已采納 2015-06-16 14:51:38

解決方案2
0 2015-06-16 14:56:19

解決方案3
0 2015-06-16 15:02:13

Perl：計數和正則表達式匹配

問題描述

3 個解決方案

解決方案1 4 已采納 2015-06-16 14:51:38

解決方案2 0 2015-06-16 14:56:19

解決方案3 0 2015-06-16 15:02:13

解決方案1
4 已采納 2015-06-16 14:51:38

解決方案2
0 2015-06-16 14:56:19

解決方案3
0 2015-06-16 15:02:13