简体   繁体   中英

Unexpected result of substitution regex in Perl

I have a script and a file.

[evelden@vatasu4435 perl]$ cat file
06:35:42,734
foo 06:35:42 bar

[evelden@vatasu4435 perl]$ cat script
#!/usr/bin/perl
while(<>){
    if(s/(\d\d:\d\d).*/\1/){
        print;
    }
}

So at the back in the regex it says.*, but not at the front.

Do,

[evelden@vatasu4435 perl]$ ./script file
06:35
foo 06:35

Apparently.* at the end takes as much as possible, which is OK.

But what I do not understand is where 'foo' comes from in the answer. This is my question.

If I change the regex in in: s/.*(\d\d:\d\d).*/\1/ , dus at the front also.*, then the answer is what I expected:

[evelden@vatasu4435 perl]$ script file
35:42
35:42

Now he is greedy at the front, but that is OK.

The content of the current line is placed in $_ . Your s/// operates on that $_ , substituting the complete pattern with the content of $1 (or \1 , as you've put it). That's the content of the first capture group in the pattern. But your pattern is not anchored, so it will start matching somewhere in the string, and replace from there. It's doing exactly what you have told it.

If you wanted to get rid of everything in the front, your second pattern is correct. If you wanted to only change lines that start with the pattern, use a ^ anchor at the front.

Only the part of the line that matches the regular expression is replaced by s/// . Since the regexp isn't anchored on the left, it matches the part of the line beginning with the time, and replaces that part. The part before the match is left unchanged, so foo remains in the line.

OP's original regex is not specific about where to start or end capture.

s/(\d\d:\d\d).*/\1/ -- look in the string for \d{2}:\d{2} and anything after it. Substitute found pattern ( \d{2}:\d{2}.* -- digits with anything following it) with captured two digits \d{2}:\d{2} . There is nothing in the pattern related to what is before \d{2}:\d{2} and no replacement applied to this part -- foo is not touched.

Perhaps OP intended to write the following code

use strict;
use warnings;

s/.*?(\d{2}:\d{2}):.*/$1/ && print for <>;

Two simple solutions to the problem

use strict;
use warnings;
use feature 'say';

while(<DATA>) {
    /(\d{2}:\d{2}):/;
    say $1;
}

__DATA__
06:35:42,734
foo 06:35:42 bar

Or other variation

use strict;
use warnings;
use feature 'say';

while(<DATA>) {
    /\b(\d{2}:\d{2})/;
    say $1;
}

__DATA__
06:35:42,734
foo 06:35:42 bar

Or may be as following

use strict;
use warnings;
use feature 'say';

my $data = do { local $/; <DATA> };
my @time = $data =~ /\b(\d{2}:\d{2})/g;

say for @time;

__DATA__
06:35:42,734
foo 06:35:42 bar

Output

06:35
06:35

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM