简体   繁体   English

Perl 中替换正则表达式的意外结果

[英]Unexpected result of substitution regex in Perl

I have a script and a file.我有一个脚本和一个文件。

[evelden@vatasu4435 perl]$ cat file
06:35:42,734
foo 06:35:42 bar

[evelden@vatasu4435 perl]$ cat script
#!/usr/bin/perl
while(<>){
    if(s/(\d\d:\d\d).*/\1/){
        print;
    }
}

So at the back in the regex it says.*, but not at the front.所以在正则表达式的后面它说。*,但不是在前面。

Do,做,

[evelden@vatasu4435 perl]$ ./script file
06:35
foo 06:35

Apparently.* at the end takes as much as possible, which is OK.显然。* 最后需要尽可能多的,这没关系。

But what I do not understand is where 'foo' comes from in the answer.但我不明白答案中“foo”的来源。 This is my question.这是我的问题。

If I change the regex in in: s/.*(\d\d:\d\d).*/\1/ , dus at the front also.*, then the answer is what I expected:如果我将正则表达式更改为: s/.*(\d\d:\d\d).*/\1/ ,前面还有 dus.*,那么答案就是我所期望的:

[evelden@vatasu4435 perl]$ script file
35:42
35:42

Now he is greedy at the front, but that is OK.现在他在前面很贪婪,但这没关系。

The content of the current line is placed in $_ .当前行的内容放在$_中。 Your s/// operates on that $_ , substituting the complete pattern with the content of $1 (or \1 , as you've put it).s///对该$_进行操作,将完整的模式替换为$1 (或\1 ,如您所说)的内容。 That's the content of the first capture group in the pattern.这是模式中第一个捕获组的内容。 But your pattern is not anchored, so it will start matching somewhere in the string, and replace from there.但是您的模式没有锚定,因此它将开始匹配字符串中的某个位置,并从那里替换。 It's doing exactly what you have told it.它正在做你告诉它的事情。

If you wanted to get rid of everything in the front, your second pattern is correct.如果您想摆脱前面的所有内容,那么您的第二种模式是正确的。 If you wanted to only change lines that start with the pattern, use a ^ anchor at the front.如果您只想更改以图案开头的线条,请在前面使用^锚。

Only the part of the line that matches the regular expression is replaced by s/// .只有与正则表达式匹配的行部分被s///替换。 Since the regexp isn't anchored on the left, it matches the part of the line beginning with the time, and replaces that part.由于正则表达式未锚定在左侧,因此它匹配以时间开头的行部分,并替换该部分。 The part before the match is left unchanged, so foo remains in the line.匹配前的部分保持不变,所以foo保留在行中。

OP's original regex is not specific about where to start or end capture. OP 的原始正则表达式并未具体说明从何处开始或结束捕获。

s/(\d\d:\d\d).*/\1/ -- look in the string for \d{2}:\d{2} and anything after it. s/(\d\d:\d\d).*/\1/ - 在字符串中查找\d{2}:\d{2}及其后的任何内容。 Substitute found pattern ( \d{2}:\d{2}.* -- digits with anything following it) with captured two digits \d{2}:\d{2} .用捕获的两个数字\d{2}:\d{2}替换找到的模式( \d{2}:\d{2}.* - 后面有任何内容的数字)。 There is nothing in the pattern related to what is before \d{2}:\d{2} and no replacement applied to this part -- foo is not touched.模式中没有与\d{2}:\d{2}之前的内容相关的任何内容,并且没有对此部分应用替换 - 没有触及foo

Perhaps OP intended to write the following code也许OP打算编写以下代码

use strict;
use warnings;

s/.*?(\d{2}:\d{2}):.*/$1/ && print for <>;

Two simple solutions to the problem解决问题的两个简单方法

use strict;
use warnings;
use feature 'say';

while(<DATA>) {
    /(\d{2}:\d{2}):/;
    say $1;
}

__DATA__
06:35:42,734
foo 06:35:42 bar

Or other variation或其他变体

use strict;
use warnings;
use feature 'say';

while(<DATA>) {
    /\b(\d{2}:\d{2})/;
    say $1;
}

__DATA__
06:35:42,734
foo 06:35:42 bar

Or may be as following或者可能如下

use strict;
use warnings;
use feature 'say';

my $data = do { local $/; <DATA> };
my @time = $data =~ /\b(\d{2}:\d{2})/g;

say for @time;

__DATA__
06:35:42,734
foo 06:35:42 bar

Output Output

06:35
06:35

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM