简体   繁体   中英

Using <> and regex to search and replace elements in text files

I'm working my way through Learning Perl , Chapter 9, "Processing Text with Regular Expressions."

Here's two of the end-of-chapter exercises:

  1. Write a program to add a copyright line to all of your exercise answers so far, placing a line like ## Copyright (c) 20XX by Yours Truly in the file immediately after the 'shebang' line. Presume that the program will be invoked with the filenames to edit already on the command line.

  2. Modify the previous program so that it doesn't edit the files that already contain the copyright line. As a hint on that, you might need to know that the name of the file being read by the diamond operator is in $ARGV.

This was my attempted solution:

#!/usr/bin/env perl

use 5.014;
use warnings;

my $shebang     = '(#!/usr/bin/env perl|#!/usr/bin/perl)'; 
my $copyright   = '# Copyright (c) 20XX Yours Truly'; 

$^I = ".bak";

while (<>) {
    unless (/$copyright/mi) {
        s/($shebang)/$1\n$copyright/mig;
    }
    print;
}

Run on the command line with perl ch9.pl sample_perl_script.pl .

My goals were:

  • Keep the original shebang intact, regardless of path.
  • Loop through <> just once.
  • Check to see if the copyright notice existed.
  • If it didn't, add it (hence the attempt with unless { ... } ).

This works for the first part of the problem (adding a copyright line) but not the second (check to make sure the copyright doesn't already exist).

My questions are: Why? And why is the unless totally ignored when I run the program?

I peeked at the appendix, and the book's proposed solution was to create a hash to track filenames from $ARGV , and pass over the files twice . First to eliminate files that already had the copyright notice, then to perform the search/replace. Like so:

my %do_these;
foreach (@ARGV) {
    $do_these{$_} = 1;
}

while (<>) { 
    if (/\A## Copyright/) {
        delete $do_these{$ARGV};
    }
}

@ARGV = sort keys %do_these; 
$^I = ".bak";
while (<>) {
    if (/\A#!/) {
        $_ .= "## Copyright (c) 20XX by Yours Truly\n";
    }
    print;
}

This works, of course, but it seems like twice the work. I'm trying to see if there's a way to do this within a single while (<>) { ... } loop, with my approach, and come away with a better understanding of how the diamond operator works.

If my approach is totally off-base, please explain why and don't spare my feelings. I'm more interested in a full understanding than my ego.

Your book's approach is stupid. Actually, I think perl is barfing because your copyright notice has special characters like ( .

What you want is the quotemeta function. Link

I'd change your program like so:

while (<>) {
    my $copyright2 = quotemeta $copyright;
    unless (/$copyright2/mi) {
        s/($shebang)/$1\n$copyright/mig;
    }
    print;
}

Apologies if that doesn't work. It's been a while since I wrote perl.

Your unless does not work because the copyright is not on the same line as the shebang. The diamond operator reads a line up until the first value of $/ , which by default is newline. Your program will perform the substitution on all the lines that do not contain the copyright.

Since this is perl, there are many ways to fix it. The most straightforward way is perhaps to unset $/ and slurp the file (read it all into one line). That way you can check right away if there is a copyright notice on the second line of the file. Eg:

local $/;                                     # slurp the file
while (<>) {
    s/^.*\n\K(?!\Q$copyright\E)/$copyright/;  # negative lookahead assertion
    print;
}

You can also check line number 2 in your files directly, without slurping the file:

while (<>) {
    if ($. == 2) {
         unless (/\Q$copyright/) {
               print "$copyright\n";
         }
    }
    print;
    close ARGV if eof;                # this will reset the line counter $.
}

Note that Nick ODell is correct that your copyright string contains meta characters (namely parentheses) which must be escaped. I used \\Q ... \\E escape sequences above.

Note also that you do not need to be very specific in checking for the shebang, that is more likely to trip you up on slightly varied lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM