[英]Perl: Count and regex matches
我在Perl脚本中遇到一个问题。 脚本生成包含以下内容的输出:
...
2:Jun 9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun 9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun 9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun 9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun 9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun 9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun 9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun 9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
...
我的脚本的后半部分必须读取所有这些通道,并创建有关每个用户获得多少成功登录的表格。 我的解决方案如下所示(已删除标头,包括严格的警告):
my %SuccessLogins;
my @LoginAttemptsSuccess;
while (my $array = <$fh>) {
if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
my $counter = () = $array =~ /Accepted\s+password\s+for\s+(\S+)/gi;
%SuccessLogins = (
"User" => $1,
"Successful" => $counter
);
push (@LoginAttemptsSuccess, \%SuccessLogins);
}
}
问题是脚本创建了由1个元素组成的AoH,并且其中只有1行。 解决方案应该是一个包含所有具有相应成功登录次数的用户的表:
User = testuser1
Successful = 6
Username = testuser2
Successful = 2
等等
我在这里已经阅读了很多正则表达式示例,但是使用正则表达式计数匹配并存储这些结果后,我仍然没有逻辑。
我会做类似的事情:
my %SuccessLogins;
while (my $array = <DATA>) {
if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
$SuccessLogins{$1}++;
}
}
say Dumper\%SuccessLogins;
__DATA__
2:Jun 9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun 9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun 9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun 9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun 9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun 9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun 9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun 9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
输出:
$VAR1 = {
'testuser4' => 1,
'testuser2' => 1,
'testuser1' => 6
};
正则表达式的“窍门”是捕获正则表达式会创建一个数组。
然后,您可以在标量上下文中评估该数组,以找出其中有多少个“匹配项”。
所以:
my $string = "fish fish fish fish fish";
my @array = $string =~ m/(fish)/g;
print "@array\n";
print scalar @array;
这就是它所做的全部。 这也适用于多行内容。
但是,这不适用于您的脚本的原因-您正在运行在每行上运行的while循环。 因此,您只会匹配一次您的模式,因此您的计数将仅为1。 同样,您的计数器-将与该模式匹配,因此不会像您期望的那样对用户登录进行计数。
避免这种情况的方法是:
(对于真正的大文件,后者是个坏主意)。 所以第一个例子:
use Data::Dumper;
my %count_of;
while ( <DATA> ) {
my ( $login) = m/Accepted password for (\w+)/;
print "$login\n";
$count_of{$login}++;
}
print Dumper \%count_of;
__DATA__
2:Jun 9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun 9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun 9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun 9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun 9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun 9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun 9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun 9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
所以第二个:
local $/;
my @logins = <DATA> =~ m/Accepted password for (\w+)/g;
print "@logins";
print scalar @logins;
__DATA__
2:Jun 9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun 9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun 9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun 9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun 9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun 9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
然后,您将像第一个示例一样减少@logins
。
但无论哪种情况-您都可以在标量上下文中对数组中的元素进行“计数”,这就是为什么它有用的原因。
模式匹配时,您还可以使用$1
, $2
等-再次,这可以用于从列表中提取特定用户,但我更喜欢直接分配。
您的脚本假设regexp将同时为“ testuser”字符串提取多个值-不会
每次在while循环中调用时,哈希构造%SuccessLogins都会创建一个新的哈希-我相信这不是您的目标
我将您的测试数据放入文件td1中,然后使用此衬纸
perl -ne '@r=/Accepted password for (\w+)/gi; for $item (@r) {$total{$item}++; } END{ use Data::Dumper; print Dumper(\%total);}' < td1
然后我意识到,在我的测试用例中,一次只读一行就可以了
perl -ne '/Accepted password for (\w+)/gi; $total{$1}++; END{ use Data::Dumper; print Dumper(\%total);}' < td1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.