How to count the number of matches in a regex capture group - Perl

Question

I need a way to count the number of matches in a regex capture group using either Perl or Bash. I can do this in Powershell but not in either of these languages. You guys have helped me get my Regex working, but every example I see just prints the capture groups. Printing the match results doesn't help me, I need to count the number of matches in each group.

Here is example data for regexing (this is the output of a command, so is not static data, nor is it from a file)

   JobID           Type State Status               Policy Schedule     Client Dest Media Svr Active PID
   41735         Backup  Done      0     Policy_name_here    daily hostname001 MediaSvr1       8100
   41734         Backup  Done      0     Policy_name_here    daily hostname002 MediaSvr1       7803
   41733         Backup  Done      0     Policy_name_here    daily hostname004 MediaSvr1       7785
   41732         Backup  Done      0     Policy_name_here    daily hostname005 MediaSvr1       27697
   41731         Backup  Done      0     Folicy_name_here    daily hostname006 MediaSvr1       27523
   41730         Backup  Done      0     Policy_name_here    daily hostname007 MediaSvr1       27834
   41729         Backup  Done      0     Policy_name_here        - hostname008 MediaSvr1       27681
   41728         Backup  Done      0     Policy_name_here        - hostname009 MediaSvr1       27496
   41727 Catalog Backup  Done      0              catalog     full hostname010 MediaSvr1       27347
   41712 Catalog Backup  Done      0              catalog        - hostname004                 30564

I cant use named capture groups as I am using Perl 5.8.5

my regex

Each capture group corresponds to a column and I need to pull the results of the capture group into a variable, so I can count using some kind of where {$var -eq '0'}.count code. Assuming Status -eq '0' denotes a successful backup, I need to count the number of successful backups in the Status capture group.

Final output is something like

Statistic.SUCCESSFUL: 20

I've accomplished this already using Powershell, but Perl is completely different and Bash seems limited. If anyone knows how to accomplish the aforementioned in either of these Languages I'd appreciate some help.

Kind Regards,

DJ

Answer 1

<>;  # Skip header

my $successes = 0;
while (<>) {
   chomp;
   my @row = /.../
      or do {
         die("Line $. doesn't match pattern\n");
         next;
      };

   ++$successes if $row[3] eq '0';
}

You could also name the columns.

<>;  # Skip header

my $successes = 0;
while (<>) {
   chomp;
   my %row;
   @row{qw( JobID Type State Status ... )}  = /.../
      or do {
         die("Line $. doesn't match pattern\n");
         next;
      };

   ++$successes if $row{Status} eq '0';
}

Finally, if you want to store the data in a data structure for later analysis, that's possible too.

<>;  # Skip header

my @rows;
while (<>) {
   chomp;
   my %row;
   @row{qw( JobID Type State Status ... )}  = /.../
      or do {
         die("Line $. doesn't match pattern\n");
         next;
      };

   push @rows, \%row;
}

my $successes = grep { $_->{Status} eq '0' } @rows;

Finally, that regex pattern is ...awful. I'd go with something like this:

sub trim(_) { $_[0] =~ s/^\s++|\s++\z//rg }

my $pattern;
my @headers;
{
   my $header_line = <>;
   chomp($header_line);
   $header_line =~ s/\bDest Media Svr\b/Dest_Media_Svr/;
   $header_line =~ s/\bActive PID\b/Active_PID/;
   $pattern = join '', map { "A".length($_) } $header_line =~ /\s*\S+/g;
   @headers = map trim, unpack $pattern, $header_line;
}

my @rows;
while (<>) {
   chomp;
   my %row; @row{@headers} = map trim, unpack $pattern, $_;
   push @rows, \%row;
}

my $successes = grep { $_->{Status} eq '0' } @rows;

How to count the number of matches in a regex capture group - Perl

Question

1 answers

solution1
1 2018-12-05 11:58:29

How to count the number of matches in a regex capture group - Perl

Question

1 answers

solution1 1 2018-12-05 11:58:29

solution1
1 2018-12-05 11:58:29