How can I handle a variable number of input lines in Perl?

Question

I am working with a Perl script where I need to work with MTA's logs. Below is the query which I want to work with.

 sh-3.2# cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c

The output of this query is stored in $case8Q1 .

   310 blk
  1279 hrd
    87 sft
144056 success
    18 unk

As you can see above query gives 5 values but this is not always the case. It can also give like this. So the number of rows may vary each time (2 or 3 or 4 or max 5)

   310 blk
144056 success
    18 unk

below is the sample code which gives the wrong result

sub get_stats {

    $case8Q1 =~ s/^\s+//;

    @case8Q1_split = split( '\n', $case8Q1 );

    @first_part    = split( ' ',  $case8Q1_split[0] );
    @second_part   = split( ' ',  $case8Q1_split[1] );
    @third_part    = split( ' ',  $case8Q1_split[2] );
    @fourth_part   = split( ' ',  $case8Q1_split[3] );
    @fifth_part    = split( ' ',  $case8Q1_split[4] );

    if ( $first_part[1] eq 'blk' ) {
        $report{Block} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'hrd' ) {
        $report{Hard} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'sft' ) {
        $report{Soft} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'success' ) {
        $report{Success} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'unk' ) {
        $report{Unknown} = $first_part[0];
    }

    # rest ifelse blocks so on........!
}

where report is hash %report .

Can someone please help me how can operate it from here.

I have all the values but if I go with normal if - else like above it will take at least 25 `blocks.

Let me know please if this is not clear.

Source log sample:

b,email@aol.com,206.1.1.8,2017-03-23 00:01:11-0700,<14901.eb201.TCR2.338351.18567117907MSOSI1.152‌OSIMS@email.domain.c‌om>,sft-routing-erro‌rs,4.4.4 (unable to route: dns lookup failure),
b,email@gmail.com,206.9.1.8,2017-03-23 00:02:13-0700,<149019.eb201.TCR2.338351.18567119237MSOSI1.15‌2OSIMS@email.domain.‌com>,sft-no-answer-f‌rom-host,4.4.1 (no answer from host), 
b,email@gmail.com,206.1.1.5,2017-03-23 03:43:36-0700,<149020.eb201.TCR2.338656.18570260933MSOSI1.15‌2OSIMS@email.domain.‌com>,sft-server-rela‌ted,4.3.2 (system not accepting network messages),smtp;421 Too many concurrent SMTP connections 
b,email@yahoo.com,,2017-03-23 03:54:44-0700,<149019.eb201.TCR2.338351.18567013352MSOSI1.15‌2OSIMS@email.domain.‌com>,sft-message-exp‌ired,4.4.7 (delivery time expired), 
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.‌102OSIMS@email.domai‌n.com>,hrd-invalid-m‌ailbox,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable 
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.‌102OSIMS@email.domai‌n.com>,hrd-invalid-d‌omain,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable 
b,email@aol.com.com,66.1.1.1,2017-03-23 05:08:44-0700,<149021.eb201.KCR2.021089.566131285MSOSI1.89OS‌IMS@email.domain.com‌>,unk-other,4.0.0 (undefined status),smtp;451 Your domain is not configured to use this MX host.
b,email@gmail.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSI‌MS@email.domain.com>‌,blk-bad-connection,‌4.4.2 (bad connection), 
b,email@qq.com.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSI‌MS@email.domain.com>‌,blk-spam-related,4.‌4.2 (bad connection),

Here the requirment goes further. I need domain count as will example -

Date          Domain       Success Block Soft Hard Unknown
2017-03-23    gmail         1       1   1   1   1    1
2017-03-23    yahoo         1       1   1   1   1    1
2017-03-23    msn           1       1   1   1   1    1
2017-03-23    aol           1       1   1   1   1    1
2017-03-23    other domain  1       1   1   1   1    1

my problem is with other domain which contains all the domain except gmail, yahoo, msn, hotmail and aol. count 1 is just example it can be 0.

Answer 1

OK, so - you've started off doing this a really hard way, because ... perl can natively do everything that cut/sort/uniq do anyway.

I can't rewrite that for you without some sample input, but ... I think you should consider that.

You should also not use global vars, and use lexical ones with my .

And - as you've noticed - if you're numbering your variable names, you really should be considering an array.

So something like this:

use Data::Dumper
my @stuff = map { [split] } split( "\n", $case8Q1 );
print Dumper \@stuff;

Gives you:

But you can go one step further, because you don't actually need to parse this into a data structure at all:

   my %data =  reverse $case8Q1 =~ m/(\d+) (\w+)/g;
   print Dumper \%data;

Which then gives you:

$VAR1 = {
          'hrd' => '1279',
          'sft' => '87',
          'blk' => '310',
          'unk' => '18',
          'success' => '144056'
        };

You can then translate that into your 'report' by using again, a key-value lookup:

my %keyword_for = ( 
    "blk" => "Block",
    "hrd" => "Hard",
    "sft" => "Soft",
    "success" => "Success",
    "unk" => "Unknown",
    );

foreach my $key ( keys %data ) { 
   $report{$keyword_for{$key}} = $data{$key}; 
}

And that gives you:

$VAR1 = {
          'Soft' => '87',
          'Unknown' => '18',
          'Success' => '144056',
          'Block' => '310',
          'Hard' => '1279'
        };

Or take it a step further still and inline the transformation using map :

my %report =   map { m/(\d+) (\w+)/ 
                 && $keyword_for{$2} // $2 => $1 } split "\n", $case8Q1;
print Dumper \%report;

And as you say you want all the values to be populated.... actually I'd suggest not doing that, and handling 'undefined' properly when generating output with something like:

my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing ); 
print join "\t", @field_order,"\n";
print join "\t", ( map { $report{$_} // 0 } @field_order),"\n";

This way you get defined-order output, where hashes don't do defined-order. This gives:

Block   Hard    Soft    Success Unknown this_field_missing  
310     1279    87      144056  18      0

But if you really want to backfill your empty hash with zero values:

$report{$_} //= 0 for values %keyword_for;

However, now you've posted some logs to go with your question - the problem's much simpler:

#!/usr/bin/env perl
use strict;
use warnings;

#configure it:
my %keyword_for = (
   "blk"     => "Block",
   "hrd"     => "Hard",
   "sft"     => "Soft",
   "success" => "Success",
   "unk"     => "Unknown",
);
#set output order - last field is for illustration purposes
my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing );

my %count_of;
#iterate 'STDIN' or files specified to command line.
#So you can 'thisscript.pl /var/log/pmta/File_name-2017-03-23*'
while (<>) {
   #split the line on commas
   my ( $id, $em_addr, $ip, $timestamp, $msg_id, $code, $desc ) = split /,/;
   #require msg_id contains '@email.domain.com'. 
   next unless $msg_id =~ m/\@email\.domain\.com/;
   #split the status field on dash, extracting first word. 
   my ($status) = $code =~ m/^(\w+)-/;
   #update the count - reference the 'keyword for' hash first, 
   #but insert 'raw' if it's something new. 
   $count_of{ $keyword_for{$status} // $status }++;
}

#print a header row (tab sep)
print join "\t", @field_order, "\n";
#print the rest of the values. 
#map is so 'missing' fields get zeros, not 'undefined'. 
print join "\t", ( map { $count_of{$_} // 0 } @field_order ), "\n";

And given the small sample you've posted, this outputs:

Block   Hard    Soft    Success Unknown this_field_missing  
2       2       4       0       1       0

Answer 2

It's hard to know what result you want, but I would do the backticks in list context so that the lines are already separated, and replace the chain of if / elsif with a simple hash lookup

This sample code builds a hash %report the same as yours, and returns a reference to it. I've had to assume that you're using backticks as it seems the most likely. Sobrique is correct that your shell code should also be done in Perl

my %map = (
    blk     => 'Block',
    hrd     => 'Soft',
    sft     => 'Block',
    success => 'Success',
    unk     => 'Unknown',
);

my $cmd = q{cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c};

sub get_stats {

    my %report;

    for ( `$cmd` ) {

        my ($val, $type) = split;

        $report{$map{$type}} = $val;
    }

    \%report;
}

How can I handle a variable number of input lines in Perl?

Question

2 answers

solution1
2 ACCPTED 2017-03-23 15:50:52

solution2
1 2017-03-23 15:56:08

How can I handle a variable number of input lines in Perl?

Question

2 answers

solution1 2 ACCPTED 2017-03-23 15:50:52

solution2 1 2017-03-23 15:56:08

solution1
2 ACCPTED 2017-03-23 15:50:52

solution2
1 2017-03-23 15:56:08