簡體   English   中英

如何在Perl中處理可變數量的輸入行?

[英]How can I handle a variable number of input lines in Perl?

我正在使用Perl腳本,在該腳本中需要使用MTA的日志。 以下是我要使用的查詢。

 sh-3.2# cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c 

該查詢的輸出存儲在$case8Q1

   310 blk
  1279 hrd
    87 sft
144056 success
    18 unk

如您所見,上面的查詢給出了5個值,但情況並非總是如此。 它也可以這樣給。 因此,行數可能每次都不同(2或3或4或最多5)

   310 blk
144056 success
    18 unk

下面是給出錯誤結果的示例代碼

sub get_stats {

    $case8Q1 =~ s/^\s+//;

    @case8Q1_split = split( '\n', $case8Q1 );

    @first_part    = split( ' ',  $case8Q1_split[0] );
    @second_part   = split( ' ',  $case8Q1_split[1] );
    @third_part    = split( ' ',  $case8Q1_split[2] );
    @fourth_part   = split( ' ',  $case8Q1_split[3] );
    @fifth_part    = split( ' ',  $case8Q1_split[4] );

    if ( $first_part[1] eq 'blk' ) {
        $report{Block} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'hrd' ) {
        $report{Hard} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'sft' ) {
        $report{Soft} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'success' ) {
        $report{Success} = $first_part[0];
    }
    elsif ( $first_part[1] eq 'unk' ) {
        $report{Unknown} = $first_part[0];
    }

    # rest ifelse blocks so on........!
}

其中報告是哈希%report

有人可以幫我如何從這里操作它。

我把所有的值,但如果我去正常的if - else像上面這將需要至少25`塊。

如果不清楚,請告訴我。

源日志樣本:

b,email@aol.com,206.1.1.8,2017-03-23 00:01:11-0700,<14901.eb201.TCR2.338351.18567117907MSOSI1.152‌​OSIMS@email.domain.c‌​om>,sft-routing-erro‌​rs,4.4.4 (unable to route: dns lookup failure),
b,email@gmail.com,206.9.1.8,2017-03-23 00:02:13-0700,<149019.eb201.TCR2.338351.18567119237MSOSI1.15‌​2OSIMS@email.domain.‌​com>,sft-no-answer-f‌​rom-host,4.4.1 (no answer from host), 
b,email@gmail.com,206.1.1.5,2017-03-23 03:43:36-0700,<149020.eb201.TCR2.338656.18570260933MSOSI1.15‌​2OSIMS@email.domain.‌​com>,sft-server-rela‌​ted,4.3.2 (system not accepting network messages),smtp;421 Too many concurrent SMTP connections 
b,email@yahoo.com,,2017-03-23 03:54:44-0700,<149019.eb201.TCR2.338351.18567013352MSOSI1.15‌​2OSIMS@email.domain.‌​com>,sft-message-exp‌​ired,4.4.7 (delivery time expired), 
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.‌​102OSIMS@email.domai‌​n.com>,hrd-invalid-m‌​ailbox,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable 
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.‌​102OSIMS@email.domai‌​n.com>,hrd-invalid-d‌​omain,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable 
b,email@aol.com.com,66.1.1.1,2017-03-23 05:08:44-0700,<149021.eb201.KCR2.021089.566131285MSOSI1.89OS‌​IMS@email.domain.com‌​>,unk-other,4.0.0 (undefined status),smtp;451 Your domain is not configured to use this MX host.
b,email@gmail.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSI‌​MS@email.domain.com>‌​,blk-bad-connection,‌​4.4.2 (bad connection), 
b,email@qq.com.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSI‌​MS@email.domain.com>‌​,blk-spam-related,4.‌​4.2 (bad connection), 

這里的要求更進一步。 我需要域計數,例如-

Date          Domain       Success Block Soft Hard Unknown
2017-03-23    gmail         1       1   1   1   1    1
2017-03-23    yahoo         1       1   1   1   1    1
2017-03-23    msn           1       1   1   1   1    1
2017-03-23    aol           1       1   1   1   1    1
2017-03-23    other domain  1       1   1   1   1    1

我的問題是其他域包含除gmail,yahoo,msn,hotmail和aol之外的所有域。 count 1只是示例,它可以為0。

好的,所以-您已經開始用一種非常困難的方式來做這件事,因為... perl可以自然地執行cut / sort / uniq所做的一切。

沒有一些示例輸入,我無法為您重寫它,但是...我認為您應該考慮一下。

您也不應使用全局變量,而將詞法變量與my

而且-正如您所注意到的-如果要給變量名編號,則確實應該考慮使用數組。

所以像這樣:

use Data::Dumper
my @stuff = map { [split] } split( "\n", $case8Q1 );
print Dumper \@stuff;

給你:

$VAR1 = [
          [
            '310',
            'blk'
          ],
          [
            '1279',
            'hrd'
          ],
          [
            '87',
            'sft'
          ],
          [
            '144056',
            'success'
          ],
          [
            '18',
            'unk'
          ]
        ];

但是您可以再走一步,因為實際上根本不需要將其解析為數據結構:

   my %data =  reverse $case8Q1 =~ m/(\d+) (\w+)/g;
   print Dumper \%data;

然后給您:

$VAR1 = {
          'hrd' => '1279',
          'sft' => '87',
          'blk' => '310',
          'unk' => '18',
          'success' => '144056'
        };

然后,您可以再次使用鍵值查找將其轉換為您的“報告”:

my %keyword_for = ( 
    "blk" => "Block",
    "hrd" => "Hard",
    "sft" => "Soft",
    "success" => "Success",
    "unk" => "Unknown",
    );

foreach my $key ( keys %data ) { 
   $report{$keyword_for{$key}} = $data{$key}; 
}

那給你:

$VAR1 = {
          'Soft' => '87',
          'Unknown' => '18',
          'Success' => '144056',
          'Block' => '310',
          'Hard' => '1279'
        };

或者更進一步,並使用map內聯轉換:

my %report =   map { m/(\d+) (\w+)/ 
                 && $keyword_for{$2} // $2 => $1 } split "\n", $case8Q1;
print Dumper \%report;

正如您所說的,您希望所有值都被填充...。實際上,我建議您不要這樣做,並在生成類似以下內容的輸出時正確處理“未定義”:

my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing ); 
print join "\t", @field_order,"\n";
print join "\t", ( map { $report{$_} // 0 } @field_order),"\n";

這樣,您將獲得定義順序的輸出,而哈希執行定義順序的輸出。 這給出:

Block   Hard    Soft    Success Unknown this_field_missing  
310     1279    87      144056  18      0   

但是,如果您真的想用零值回填您的空哈希,請執行以下操作:

$report{$_} //= 0 for values %keyword_for;

但是,現在您發布了一些日志以解決您的問題-問題簡單得多:

#!/usr/bin/env perl
use strict;
use warnings;

#configure it:
my %keyword_for = (
   "blk"     => "Block",
   "hrd"     => "Hard",
   "sft"     => "Soft",
   "success" => "Success",
   "unk"     => "Unknown",
);
#set output order - last field is for illustration purposes
my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing );

my %count_of;
#iterate 'STDIN' or files specified to command line.
#So you can 'thisscript.pl /var/log/pmta/File_name-2017-03-23*'
while (<>) {
   #split the line on commas
   my ( $id, $em_addr, $ip, $timestamp, $msg_id, $code, $desc ) = split /,/;
   #require msg_id contains '@email.domain.com'. 
   next unless $msg_id =~ m/\@email\.domain\.com/;
   #split the status field on dash, extracting first word. 
   my ($status) = $code =~ m/^(\w+)-/;
   #update the count - reference the 'keyword for' hash first, 
   #but insert 'raw' if it's something new. 
   $count_of{ $keyword_for{$status} // $status }++;
}

#print a header row (tab sep)
print join "\t", @field_order, "\n";
#print the rest of the values. 
#map is so 'missing' fields get zeros, not 'undefined'. 
print join "\t", ( map { $count_of{$_} // 0 } @field_order ), "\n";

考慮到您發布的小樣本,輸出如下:

Block   Hard    Soft    Success Unknown this_field_missing  
2       2       4       0       1       0   

很難知道您想要什么結果,但是我會在列表上下文中執行反引號,以便各行已經分開,並用簡單的哈希查找替換if / elsif

此示例代碼將構建與您的哈希值相同的哈希%report ,並返回對其的引用。 我不得不假設您正在使用反引號,因為這很有可能。 Sobrique是正確的,您的shell代碼也應該在Perl中完成

my %map = (
    blk     => 'Block',
    hrd     => 'Soft',
    sft     => 'Block',
    success => 'Success',
    unk     => 'Unknown',
);

my $cmd = q{cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c};

sub get_stats {

    my %report;

    for ( `$cmd` ) {

        my ($val, $type) = split;

        $report{$map{$type}} = $val;
    }

    \%report;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM