[英]How to parse multiple files in Perl
I have this sample data I wanted to parse and there are more than 10 files like this, how can I parse them?我有我想要解析的示例数据,并且有 10 多个这样的文件,我该如何解析它们? I need the second line of the data and extract only code, date and message.
我需要数据的第二行并只提取代码、日期和消息。
foreach my $dir (@not_proc_dir) {
chomp ($dir);
print "$dir\n";
opendir (DIR, $dir) or die "Couldn't open directory, $!";
while ( my $file = readdir DIR) {
next if $file =~ /^\.\.?$/;
# next if (-d $file);
next if -d "$dir/$file";
#print "\t$file\n";
$file = "${dir}/${file}";
if ($file =~ /\.err/) {
parse_err($file);
}
elsif ($file =~ /\.xml$/) {
parse_xml($file);
}
elsif ($file =~ /\.enrich/){
parse_enrich($file);
}
}
close DIR;
sub parse_err {
my $xml = shift;
my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
my $secLine;
foreach(@array) {
my $secLine = $_;
last;
}
close DATA;
}
open doesn't return the lines of the file. open不返回文件的行。 You need to use readline .
您需要使用readline 。
open my $in, '<', $xml or die "Can't open $xml: $!";
<$in>; # Ignore the first line.
my $second_line = <$in>;
The diamond operator <$in>
is a shorter version of readline $in
.菱形运算符
<$in>
是readline $in
的较短版本。
This subroutine is very strange.这个子程序很奇怪。
sub parse_err {
my $xml = shift;
my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
my $secLine;
foreach (@array) {
my $secLine = $_;
last;
}
close DATA;
}
open()
simply returns a true or false value, indicating whether the file was opened successfully. open()
只返回一个 true 或 false 值,指示文件是否成功打开。 Storing that return value in an array makes no sense.将该返回值存储在数组中是没有意义的。
You then declare a variable called $secLine
that you never use.然后声明一个从未使用过的名为
$secLine
的变量。
You then iterate across the contents of @array
(which only has one element in it, so the loop only executes once).然后遍历
@array
的内容(其中只有一个元素,因此循环只执行一次)。
In the loop body, you declare another variable called $secLine
and copy the value from the array into that variable.在循环体中,您声明另一个名为
$secLine
变量,并将数组中的值复制到该变量中。 You then exit the loop - so your second variable called $secLine
goes out of scope and ceases to exist.然后退出循环 - 所以你的第二个变量
$secLine
超出范围并不再存在。 This effectively means that your loop has no effect whatsoever.这实际上意味着您的循环没有任何影响。
All in all, you seem very confused.总而言之,你看起来很困惑。 If this is coursework, then I recommend you go back through your class notes and have a closer look at the section about reading data from files.
如果这是课程作业,那么我建议您回顾一下课堂笔记,并仔细查看有关从文件中读取数据的部分。
I think you want something like this:我想你想要这样的东西:
sub parse_err {
my ($filename) = @_;
open my $fh, '<', $filename or die "Could'nt open file '$filename': $!\n";
<$fh>; # Read and ignore first line.
my $line = <$fh>; # Read second line
my (undef, $code, undef, $date, $time, $message) = split /\s+/, $line, 6;
$date = "$date $time";
return ($code, $date, $message);
}
This subroutine returns three values - $code
, $date
and $message
.此子例程返回三个值 -
$code
、 $date
和$message
。 You'll need to assign those to variables as you call the subroutine and then do something useful with them.您需要在调用子例程时将它们分配给变量,然后对它们做一些有用的事情。
my ($code, $date, $message) = parse_err($file);
Attn: OP收件人:OP
In feature please provide sample of input data in text format (copy+paste from terminal windows).在功能中,请提供文本格式的输入数据样本(从终端窗口复制+粘贴)。
The code is very simple to implement with perl script代码很简单,用perl脚本实现
define regex of interest
定义感兴趣的正则表达式
look for '*.err' files
寻找“*.err”文件
open file
打开文件
look for pattern
寻找模式
extract data
提取数据
print out found data
打印出找到的数据
use strict;
use warnings;
use feature 'say';
my $re = qr/\d\s+(\d{4})\s+E\s+(\d{1,2}-\d{1,2}-\d{4})\s+(\d{1,2}:\d{1,2}:\d{1,2})\s+(.*)/;
for my $filename ( glob("*.err") ) {
say '------------------';
say $filename;
say '------------------';
open my $fh, '<', $filename
or die "Couldn't open $filename : $!";
while( <$fh> ) {
chomp;
next unless /$re/;
my($code,$date,$time,$msg) = ($1,$2,$3,$4);
say 'Code: ' . $code;
say 'Date: ' . $date;
say 'Time: ' . $time;
say 'Message: ' . $msg;
say '------------------';
}
close $fh;
}
Input data输入数据
4 0
1 9001 E 10-17-2019 23:15:39 ORA-01400: cannot insert NULL into
Error at character 139 of the following SQL;
insert into lot (lot_key, lot_id, part_cnt,
.....
1 9001 E 10-17-2019 23:15:39 Error Executing lot_put_row 2
1 9001 E 10-17-2019 23:15:39 DBASCII: Exit called from file ora/dbreader.pc at line 10666
Version 2.6.2 - April 26, 2017 (DB schema 10.6rl)
1 9001 E 10-17-2019 23:15:39
ROLLBACK was successfill
Output输出
------------------
ora-err-01400.err
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: ORA-01400: cannot insert NULL into
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: Error Executing lot_put_row 2
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: DBASCII: Exit called from file ora/dbreader.pc at line 10666
------------------
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.