简体   繁体   中英

How to parse multiple files in Perl

I have this sample data I wanted to parse and there are more than 10 files like this, how can I parse them? I need the second line of the data and extract only code, date and message.

在此处输入图片说明

foreach my $dir (@not_proc_dir) {
    chomp ($dir);
    print "$dir\n";

    opendir (DIR, $dir) or die "Couldn't open directory, $!";
    while ( my $file = readdir DIR) {
            next if $file =~ /^\.\.?$/;
    #       next if (-d $file);
            next if -d "$dir/$file";
            #print "\t$file\n";
            $file = "${dir}/${file}";
            if ($file =~ /\.err/) {
                    parse_err($file);
            }
            elsif ($file =~ /\.xml$/) {
                    parse_xml($file);
            }
            elsif ($file =~ /\.enrich/){
                    parse_enrich($file);
            }
    }
    close DIR;

sub parse_err {
         my $xml = shift;
        my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
        my $secLine;
        foreach(@array) {
                my $secLine = $_;
                last;
        }
        close DATA;
}

open doesn't return the lines of the file. You need to use readline .

open my $in, '<', $xml or die "Can't open $xml: $!";
<$in>;  # Ignore the first line.
my $second_line = <$in>;

The diamond operator <$in> is a shorter version of readline $in .

This subroutine is very strange.

sub parse_err {
    my $xml = shift;
    my @array = open(DATA, $xml) or die "Couldn't open file $xml, $!";
    my $secLine;
    foreach (@array) {
        my $secLine = $_;
        last;
    }
    close DATA;
}

open() simply returns a true or false value, indicating whether the file was opened successfully. Storing that return value in an array makes no sense.

You then declare a variable called $secLine that you never use.

You then iterate across the contents of @array (which only has one element in it, so the loop only executes once).

In the loop body, you declare another variable called $secLine and copy the value from the array into that variable. You then exit the loop - so your second variable called $secLine goes out of scope and ceases to exist. This effectively means that your loop has no effect whatsoever.

All in all, you seem very confused. If this is coursework, then I recommend you go back through your class notes and have a closer look at the section about reading data from files.

I think you want something like this:

sub parse_err {
  my ($filename) = @_;

  open my $fh, '<', $filename or die "Could'nt open file '$filename': $!\n";

  <$fh>; # Read and ignore first line.
  my $line = <$fh>; # Read second line

  my (undef, $code, undef, $date, $time, $message) = split /\s+/, $line, 6;

  $date = "$date $time";

  return ($code, $date, $message);
}

This subroutine returns three values - $code , $date and $message . You'll need to assign those to variables as you call the subroutine and then do something useful with them.

my ($code, $date, $message) = parse_err($file);

Attn: OP

In feature please provide sample of input data in text format (copy+paste from terminal windows).

The code is very simple to implement with perl script

define regex of interest

look for '*.err' files

  • open file

  • look for pattern

  • extract data

  • print out found data

use strict;
use warnings;
use feature 'say';

my $re = qr/\d\s+(\d{4})\s+E\s+(\d{1,2}-\d{1,2}-\d{4})\s+(\d{1,2}:\d{1,2}:\d{1,2})\s+(.*)/;

for my $filename ( glob("*.err") ) {
    say '------------------';
    say $filename;
    say '------------------';
    open my $fh, '<', $filename
        or die "Couldn't open $filename : $!";
    
    while( <$fh> ) {
        chomp;
        next unless /$re/;
        my($code,$date,$time,$msg) = ($1,$2,$3,$4);
        say 'Code: '    . $code;
        say 'Date: '    . $date;
        say 'Time: '    . $time;
        say 'Message: ' . $msg;
        say '------------------';
    }
    
    close $fh;
}

Input data

4   0
1   9001    E   10-17-2019  23:15:39    ORA-01400: cannot insert NULL into
Error at character 139 of the following SQL;
insert into lot (lot_key, lot_id, part_cnt,
.....
1   9001    E   10-17-2019  23:15:39    Error Executing lot_put_row 2
1   9001    E   10-17-2019  23:15:39    DBASCII: Exit called from file ora/dbreader.pc at line 10666
Version 2.6.2 - April 26, 2017 (DB schema 10.6rl)
1   9001    E   10-17-2019  23:15:39
ROLLBACK was successfill

Output

------------------
ora-err-01400.err
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: ORA-01400: cannot insert NULL into
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: Error Executing lot_put_row 2
------------------
Code: 9001
Date: 10-17-2019
Time: 23:15:39
Message: DBASCII: Exit called from file ora/dbreader.pc at line 10666
------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM