如何在 Perl 中解析多个文件

Question

I have this sample data I wanted to parse and there are more than 10 files like this, how can I parse them?我有我想要解析的示例数据，并且有 10 多个这样的文件，我该如何解析它们？ I need the second line of the data and extract only code, date and message.我需要数据的第二行并只提取代码、日期和消息。

foreach my $dir (@not_proc_dir) {
    chomp ($dir);
    print "$dir\n";

    opendir (DIR, $dir) or die "Couldn't open directory, $!";
    while ( my $file = readdir DIR) {
            next if $file =~ /^\.\.?$/;
    #       next if (-d $file);
            next if -d "$dir/$file";
            #print "\t$file\n";
            $file = "${dir}/${file}";
            if ($file =~ /\.err/) {
                    parse_err($file);
            }
            elsif ($file =~ /\.xml$/) {
                    parse_xml($file);
            }
            elsif ($file =~ /\.enrich/){
                    parse_enrich($file);
            }
    }
    close DIR;

sub parse_err {
         my $xml = shift;
        my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
        my $secLine;
        foreach(@array) {
                my $secLine = $_;
                last;
        }
        close DATA;
}

Answer 1

open doesn't return the lines of the file. open不返回文件的行。 You need to use readline .您需要使用readline 。

open my $in, '<', $xml or die "Can't open $xml: $!";
<$in>;  # Ignore the first line.
my $second_line = <$in>;

The diamond operator <$in> is a shorter version of readline $in .菱形运算符<$in>是readline $in的较短版本。

Answer 2

This subroutine is very strange.这个子程序很奇怪。

sub parse_err {
    my $xml = shift;
    my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
    my $secLine;
    foreach (@array) {
        my $secLine = $_;
        last;
    }
    close DATA;
}

open() simply returns a true or false value, indicating whether the file was opened successfully. open()只返回一个 true 或 false 值，指示文件是否成功打开。 Storing that return value in an array makes no sense.将该返回值存储在数组中是没有意义的。

You then declare a variable called $secLine that you never use.然后声明一个从未使用过的名为$secLine的变量。

You then iterate across the contents of @array (which only has one element in it, so the loop only executes once).然后遍历@array的内容（其中只有一个元素，因此循环只执行一次）。

In the loop body, you declare another variable called $secLine and copy the value from the array into that variable.在循环体中，您声明另一个名为$secLine变量，并将数组中的值复制到该变量中。 You then exit the loop - so your second variable called $secLine goes out of scope and ceases to exist.然后退出循环 - 所以你的第二个变量$secLine超出范围并不再存在。 This effectively means that your loop has no effect whatsoever.这实际上意味着您的循环没有任何影响。

All in all, you seem very confused.总而言之，你看起来很困惑。 If this is coursework, then I recommend you go back through your class notes and have a closer look at the section about reading data from files.如果这是课程作业，那么我建议您回顾一下课堂笔记，并仔细查看有关从文件中读取数据的部分。

I think you want something like this:我想你想要这样的东西：

sub parse_err {
  my ($filename) = @_;

  open my $fh, '<', $filename or die "Could'nt open file '$filename': $!\n";

  <$fh>; # Read and ignore first line.
  my $line = <$fh>; # Read second line

  my (undef, $code, undef, $date, $time, $message) = split /\s+/, $line, 6;

  $date = "$date $time";

  return ($code, $date, $message);
}

This subroutine returns three values - $code , $date and $message .此子例程返回三个值 - $code 、 $date和$message 。 You'll need to assign those to variables as you call the subroutine and then do something useful with them.您需要在调用子例程时将它们分配给变量，然后对它们做一些有用的事情。

my ($code, $date, $message) = parse_err($file);

Answer 3

Attn: OP收件人：OP

In feature please provide sample of input data in text format (copy+paste from terminal windows).在功能中，请提供文本格式的输入数据样本（从终端窗口复制+粘贴）。

The code is very simple to implement with perl script代码很简单，用perl脚本实现

define regex of interest定义感兴趣的正则表达式

look for '*.err' files寻找“*.err”文件

open file打开文件

look for pattern寻找模式

extract data提取数据

print out found data打印出找到的数据

use strict;
use warnings;
use feature 'say';

my $re = qr/\d\s+(\d{4})\s+E\s+(\d{1,2}-\d{1,2}-\d{4})\s+(\d{1,2}:\d{1,2}:\d{1,2})\s+(.*)/;

for my $filename ( glob("*.err") ) {
    say '------------------';
    say $filename;
    say '------------------';
    open my $fh, '<', $filename
        or die "Couldn't open $filename : $!";
    
    while( <$fh> ) {
        chomp;
        next unless /$re/;
        my($code,$date,$time,$msg) = ($1,$2,$3,$4);
        say 'Code: '    . $code;
        say 'Date: '    . $date;
        say 'Time: '    . $time;
        say 'Message: ' . $msg;
        say '------------------';
    }
    
    close $fh;
}

Input data输入数据

4   0
1   9001    E   10-17-2019  23:15:39    ORA-01400: cannot insert NULL into
Error at character 139 of the following SQL;
insert into lot (lot_key, lot_id, part_cnt,
.....
1   9001    E   10-17-2019  23:15:39    Error Executing lot_put_row 2
1   9001    E   10-17-2019  23:15:39    DBASCII: Exit called from file ora/dbreader.pc at line 10666
Version 2.6.2 - April 26, 2017 (DB schema 10.6rl)
1   9001    E   10-17-2019  23:15:39
ROLLBACK was successfill

Output输出

------------------
ora-err-01400.err
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: ORA-01400: cannot insert NULL into
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: Error Executing lot_put_row 2
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: DBASCII: Exit called from file ora/dbreader.pc at line 10666
------------------

如何在 Perl 中解析多个文件

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-03-17 14:07:45

解决方案2
2 2020-03-17 15:04:27

解决方案3
1 2020-03-17 20:05:50

如何在 Perl 中解析多个文件

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-03-17 14:07:45

解决方案2 2 2020-03-17 15:04:27

解决方案3 1 2020-03-17 20:05:50

解决方案1
3 已采纳 2020-03-17 14:07:45

解决方案2
2 2020-03-17 15:04:27

解决方案3
1 2020-03-17 20:05:50