简体   繁体   中英

Perl: Count and regex matches

I got stuck with one problem in my Perl script. Script generates output which consists of following:

...
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
...

Second half of my script has to read all those lanes and create table over how many successful logins each user got. My solution looks like this (removed header including strict, warnings):

my %SuccessLogins;
my @LoginAttemptsSuccess;
while (my $array = <$fh>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      my $counter = () = $array =~ /Accepted\s+password\s+for\s+(\S+)/gi;
      %SuccessLogins = (
        "User"  => $1,
        "Successful"    => $counter
      );
      push (@LoginAttemptsSuccess, \%SuccessLogins);
    }
}

Problem is that script creates AoH which consists of 1 element and in it I get just 1 row. Solution should be a table with all users with corresponding number of successful logins:

User = testuser1
Successful = 6

Username = testuser2
Successful = 2

etc.

I have read a lot of regex examples here on SO but I still don't get logic behind counting matches using regex and storing those results.

I'd do something like:

my %SuccessLogins;
while (my $array = <DATA>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      $SuccessLogins{$1}++;
    }
}
say Dumper\%SuccessLogins;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

Output:

$VAR1 = {
  'testuser4' => 1,
  'testuser2' => 1,
  'testuser1' => 6
};

The 'trick' with regular expressions is that a capturing regular expression creates an array.

You can then evaluate that array in a scalar context, to figure out how many 'hits' there were.

So:

my $string = "fish fish fish fish fish";

my @array = $string =~ m/(fish)/g;

print "@array\n";

print scalar @array;

And that's really all it's doing. This works for multi line stuff too.

The reason this isn't working with your script though - is that you're running a while loop that runs on each line. So you'll only ever match your pattern once, so your count will only be one. Likewise - your counter - will be of any match of the pattern, so isn't counting user logins like you expect.

The way you avoid this is either:

  • continue to work one line at a time and amend code accordingly.
  • treat your file handle as a single 'chunk'.

(The latter is a bad idea for really big files). So an example for the first:

use Data::Dumper;

my %count_of;
while ( <DATA> ) {
   my ( $login) = m/Accepted password for (\w+)/;
   print "$login\n"; 
   $count_of{$login}++;
}

print Dumper \%count_of;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

So the second:

local $/;
my @logins = <DATA> =~ m/Accepted password for (\w+)/g;
print "@logins";

print scalar @logins;

__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2

You'd then reduce @logins much like in the first example.

But in either case - you can 'count' the elements in an array by doing it in a scalar context, which is why it's useful.

You also have $1 , $2 etc. to draw upon when a pattern matches - again, this can be used for extracting a specific user from the list, but I prefer a more direct assignment.

Your script assumes that the regexp will pull multiple values all at the same time for the "testuser" string - it will not

The hash construct %SuccessLogins makes a new hash each time it is called in the while loops - which is not what you are aiming to do, I believe

I put your test data in the file td1 and then used this one liner

perl -ne '@r=/Accepted password for (\w+)/gi; for $item (@r) {$total{$item}++;  } END{  use Data::Dumper; print Dumper(\%total);}' < td1

then I realised that in my test case with one line at a time being read in I might as well do this

perl -ne '/Accepted password for (\w+)/gi;  $total{$1}++;  END{  use Data::Dumper; print Dumper(\%total);}' < td1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM