[英]How can I handle a variable number of input lines in Perl?
我正在使用Perl脚本,在该脚本中需要使用MTA的日志。 以下是我要使用的查询。
sh-3.2# cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c
该查询的输出存储在$case8Q1
。
310 blk
1279 hrd
87 sft
144056 success
18 unk
如您所见,上面的查询给出了5个值,但情况并非总是如此。 它也可以这样给。 因此,行数可能每次都不同(2或3或4或最多5)
310 blk
144056 success
18 unk
下面是给出错误结果的示例代码
sub get_stats {
$case8Q1 =~ s/^\s+//;
@case8Q1_split = split( '\n', $case8Q1 );
@first_part = split( ' ', $case8Q1_split[0] );
@second_part = split( ' ', $case8Q1_split[1] );
@third_part = split( ' ', $case8Q1_split[2] );
@fourth_part = split( ' ', $case8Q1_split[3] );
@fifth_part = split( ' ', $case8Q1_split[4] );
if ( $first_part[1] eq 'blk' ) {
$report{Block} = $first_part[0];
}
elsif ( $first_part[1] eq 'hrd' ) {
$report{Hard} = $first_part[0];
}
elsif ( $first_part[1] eq 'sft' ) {
$report{Soft} = $first_part[0];
}
elsif ( $first_part[1] eq 'success' ) {
$report{Success} = $first_part[0];
}
elsif ( $first_part[1] eq 'unk' ) {
$report{Unknown} = $first_part[0];
}
# rest ifelse blocks so on........!
}
其中报告是哈希%report
。
有人可以帮我如何从这里操作它。
我把所有的值,但如果我去正常的if
- else
像上面这将需要至少25`块。
如果不清楚,请告诉我。
源日志样本:
b,email@aol.com,206.1.1.8,2017-03-23 00:01:11-0700,<14901.eb201.TCR2.338351.18567117907MSOSI1.152OSIMS@email.domain.com>,sft-routing-errors,4.4.4 (unable to route: dns lookup failure),
b,email@gmail.com,206.9.1.8,2017-03-23 00:02:13-0700,<149019.eb201.TCR2.338351.18567119237MSOSI1.152OSIMS@email.domain.com>,sft-no-answer-from-host,4.4.1 (no answer from host),
b,email@gmail.com,206.1.1.5,2017-03-23 03:43:36-0700,<149020.eb201.TCR2.338656.18570260933MSOSI1.152OSIMS@email.domain.com>,sft-server-related,4.3.2 (system not accepting network messages),smtp;421 Too many concurrent SMTP connections
b,email@yahoo.com,,2017-03-23 03:54:44-0700,<149019.eb201.TCR2.338351.18567013352MSOSI1.152OSIMS@email.domain.com>,sft-message-expired,4.4.7 (delivery time expired),
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.102OSIMS@email.domain.com>,hrd-invalid-mailbox,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable
b,email@msn.com,206.1.1.1,2017-03-23 05:04:20-0700,<14902666.eb201.TCR2.3831.2620484MSOSI6374125.102OSIMS@email.domain.com>,hrd-invalid-domain,5.0.0 (undefined status),smtp;550 Requested action not taken: mailbox unavailable
b,email@aol.com.com,66.1.1.1,2017-03-23 05:08:44-0700,<149021.eb201.KCR2.021089.566131285MSOSI1.89OSIMS@email.domain.com>,unk-other,4.0.0 (undefined status),smtp;451 Your domain is not configured to use this MX host.
b,email@gmail.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSIMS@email.domain.com>,blk-bad-connection,4.4.2 (bad connection),
b,email@qq.com.com,206.1.1.1,2017-03-23 05:13:22-0700,<1490206.eb201.KCR2.6637.56206428MSOSI1.102OSIMS@email.domain.com>,blk-spam-related,4.4.2 (bad connection),
这里的要求更进一步。 我需要域计数,例如-
Date Domain Success Block Soft Hard Unknown
2017-03-23 gmail 1 1 1 1 1 1
2017-03-23 yahoo 1 1 1 1 1 1
2017-03-23 msn 1 1 1 1 1 1
2017-03-23 aol 1 1 1 1 1 1
2017-03-23 other domain 1 1 1 1 1 1
我的问题是其他域包含除gmail,yahoo,msn,hotmail和aol之外的所有域。 count 1只是示例,它可以为0。
好的,所以-您已经开始用一种非常困难的方式来做这件事,因为... perl
可以自然地执行cut / sort / uniq所做的一切。
没有一些示例输入,我无法为您重写它,但是...我认为您应该考虑一下。
您也不应使用全局变量,而将词法变量与my
。
而且-正如您所注意到的-如果要给变量名编号,则确实应该考虑使用数组。
所以像这样:
use Data::Dumper
my @stuff = map { [split] } split( "\n", $case8Q1 );
print Dumper \@stuff;
给你:
$VAR1 = [
[
'310',
'blk'
],
[
'1279',
'hrd'
],
[
'87',
'sft'
],
[
'144056',
'success'
],
[
'18',
'unk'
]
];
但是您可以再走一步,因为实际上根本不需要将其解析为数据结构:
my %data = reverse $case8Q1 =~ m/(\d+) (\w+)/g;
print Dumper \%data;
然后给您:
$VAR1 = {
'hrd' => '1279',
'sft' => '87',
'blk' => '310',
'unk' => '18',
'success' => '144056'
};
然后,您可以再次使用键值查找将其转换为您的“报告”:
my %keyword_for = (
"blk" => "Block",
"hrd" => "Hard",
"sft" => "Soft",
"success" => "Success",
"unk" => "Unknown",
);
foreach my $key ( keys %data ) {
$report{$keyword_for{$key}} = $data{$key};
}
那给你:
$VAR1 = {
'Soft' => '87',
'Unknown' => '18',
'Success' => '144056',
'Block' => '310',
'Hard' => '1279'
};
或者更进一步,并使用map
内联转换:
my %report = map { m/(\d+) (\w+)/
&& $keyword_for{$2} // $2 => $1 } split "\n", $case8Q1;
print Dumper \%report;
正如您所说的,您希望所有值都被填充...。实际上,我建议您不要这样做,并在生成类似以下内容的输出时正确处理“未定义”:
my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing );
print join "\t", @field_order,"\n";
print join "\t", ( map { $report{$_} // 0 } @field_order),"\n";
这样,您将获得定义顺序的输出,而哈希不执行定义顺序的输出。 这给出:
Block Hard Soft Success Unknown this_field_missing
310 1279 87 144056 18 0
但是,如果您真的想用零值回填您的空哈希,请执行以下操作:
$report{$_} //= 0 for values %keyword_for;
但是,现在您发布了一些日志以解决您的问题-问题要简单得多:
#!/usr/bin/env perl
use strict;
use warnings;
#configure it:
my %keyword_for = (
"blk" => "Block",
"hrd" => "Hard",
"sft" => "Soft",
"success" => "Success",
"unk" => "Unknown",
);
#set output order - last field is for illustration purposes
my @field_order = qw ( Block Hard Soft Success Unknown this_field_missing );
my %count_of;
#iterate 'STDIN' or files specified to command line.
#So you can 'thisscript.pl /var/log/pmta/File_name-2017-03-23*'
while (<>) {
#split the line on commas
my ( $id, $em_addr, $ip, $timestamp, $msg_id, $code, $desc ) = split /,/;
#require msg_id contains '@email.domain.com'.
next unless $msg_id =~ m/\@email\.domain\.com/;
#split the status field on dash, extracting first word.
my ($status) = $code =~ m/^(\w+)-/;
#update the count - reference the 'keyword for' hash first,
#but insert 'raw' if it's something new.
$count_of{ $keyword_for{$status} // $status }++;
}
#print a header row (tab sep)
print join "\t", @field_order, "\n";
#print the rest of the values.
#map is so 'missing' fields get zeros, not 'undefined'.
print join "\t", ( map { $count_of{$_} // 0 } @field_order ), "\n";
考虑到您发布的小样本,输出如下:
Block Hard Soft Success Unknown this_field_missing
2 2 4 0 1 0
很难知道您想要什么结果,但是我会在列表上下文中执行反引号,以便各行已经分开,并用简单的哈希查找替换if
/ elsif
链
此示例代码将构建与您的哈希值相同的哈希%report
,并返回对其的引用。 我不得不假设您正在使用反引号,因为这很有可能。 Sobrique是正确的,您的shell代码也应该在Perl中完成
my %map = (
blk => 'Block',
hrd => 'Soft',
sft => 'Block',
success => 'Success',
unk => 'Unknown',
);
my $cmd = q{cat /var/log/pmta/File_name-2017-03-23*|egrep 'email.domain.com'|cut -d, -f6|cut -d- -f1|sort|uniq -c};
sub get_stats {
my %report;
for ( `$cmd` ) {
my ($val, $type) = split;
$report{$map{$type}} = $val;
}
\%report;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.