简体   繁体   中英

Advice on Perl regular expression script

I am trying to write a script that will read from a text file, and output to another file the lines that do not match a regular expression.

I have a file with two columns - in the first column are library Dewey numbers and in the second column are barcodes. A Dewey number should be something like 150 ADD, or 150.40 ADD. I am looking for lines where the 3 character author initials are missing. My example file looks like this:

100.20 SAD 350694345
250 ADD 369803434
300 360349320
300.1534234 ZOO 353000303
210 3633400340

I have written a script to output all lines where there are no author initials after the Dewey number. The regex is looking for three digits before an optional dot, then zero or more optional digits, then a space, and then the three letters of the author initials.

$filename = 'call.txt';
$output = 'result.txt';
open(FILE, $filename) or die 'Could not open $filename';
foreach $line (<FILE>) {
    if ($line !~ /^\d{3}\.*\d* [a-zA-Z]{3}/) {

        open (CALL, '>', $output) or die $!;
        print CALL $line;
    }
}

When I run the script it only outputs the fifth line:

210 3633400340

Why isn't it also picking up line 3 as that doesn't match the regex? The output should be all Dewey numbers without the author initials. So the desired output is:

300 360349320
210 3633400340

Problems:

  • You should ALWAYS use use strict; use warnings qw( all ); use strict; use warnings qw( all ); . (Since it should always used, we don't bother showing it in our snippets.) This detects numerous problems at no cost.
  • You needlessly use global variables. ( use strict; will help you with that, except for the file handles.)
  • By creating the file repeatedly in the loop, you are clobbering all but your last line of output. (This is the problem you were asking about.)
  • Your pattern incorrectly counts sequences of more than three characters as if they are sequences of three characters.
  • By using <> in list context, you are loading the entire file into memory when it would have been just as easy to read from the file line by line.
  • Your error messages aren't very useful.

Fixed:

#!/usr/bin/perl

use strict;
use warnings qw( all );

my $in_qfn  = 'call.txt';
my $out_qfn = 'result.txt';

open(my $fh_in, '<', $in_qfn)
   or die("Can't open \"$in_qfn\": $!\n");
open(my $fh_out, '>', $out_qfn)
   or die("Can't create \"$out_qfn\": $!\n");

while (<$fh_in>) {
   print $out_fh $_ if !/^\S+\s+\S{3}\s/;
}

The program is far more useful if you don't harcode the file names.

#!/usr/bin/perl

use strict;
use warnings qw( all );

while (<>) {
   print if !/^\S+\s+\S{3}\s/;
}

Usage:

script call.txt >result.txt

or

script <call.txt >result.txt

You're opening the file with truncation (">") every time you find a match. Move the open(CALL, ...) before the for loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM