简体   繁体   中英

Matching numbers for substitution in Perl

I have this little script:

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

foreach (@list) {
    s/(\d{2}).*\.txt$/$1.txt/;
    s/^0+//;
    print $_ . "\n";
}

The expected output would be

5.txt
12.txt
1.txt

But instead, I get

R3_05.txt
T3_12.txt
1.txt

The last one is fine, but I cannot fathom why the regex gives me the string start for $1 on this case.

Try this pattern

foreach (@list) {
    s/^.*?_?(?|0(\d)|(\d{2})).*\.txt$/$1.txt/;
    print $_ . "\n";
}


Explanations:

I use here the branch reset feature (ie (?|...()...|...()...) ) that allows to put several capturing groups in a single reference ( $1 here ). So, you avoid using a second replacement to trim a zero from the left of the capture.

To remove all from the begining before the number, I use :

.*?     # all characters zero or more times 
        # ( ? -> make the * quantifier lazy to match as less as possible)
_?      # an optional underscore



Note that you can ensure that you have only 2 digits adding a lookahead to check if there is not a digit that follows:

s/^.*?_?(?|0(\d)|(\d{2}))(?!\d).*\.txt$/$1.txt/;

(?!\\d) means not followed by a digit .

The problem here is that your substitution regex does not cover the whole string, so only part of the string is substituted. But you are using a rather complex solution for a simple problem.

It seems that what you want is to read two digits from the string, and then add .txt to the end of it. So why not just do that?

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

for (@list) {
    if (/(\d{2})/) {
        $_ = "$1.txt";
    }
}

To overcome the leading zero effect, you can force a conversion to a number by adding zero to it:

$_ = 0+$1 . ".txt";

I would modify your regular expression. Try using this code:

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

foreach (@list) {
    s/.*(\d{2}).*\.txt$/$1.txt/;
    s/^0+//;
    print $_ . "\n";
}

The problem is that the first part in your s/// matches, what you think it does, but that the second part isn't replacing what you think it should. s/// will only replace what was previously matched. Thus to replace something like T3_ you will have to match that too.

s/.*(\d{2}).*\.txt$/$1.txt/;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM