简体   繁体   English

为什么这个 perl 正则表达式不起作用?

[英]Why is this perl regex not working?

I have this array.我有这个数组。

my @input = ("He walk+V3SG very fast.", "He study+V3SG hard.");

and I want to substitute 'walk+V3SG' and 'study+V3SG' to 'walks' and 'studies'.我想将“walk+V3SG”和“study+V3SG”替换为“walks”和“studies”。

Below is the script I wrote.下面是我写的脚本。 I thought this should work but for some reason it is not working.我认为这应该有效,但由于某种原因它不起作用。

    foreach my $sent(@input){
    if ($sent =~ m/\Q+V3SG/){
        if ($sent =~ m/\Q[dlr]y+V3SG/){
            $sent =~ s/\Q[dlr]y+V3SG/ies/g;
        }
        if ($sent =~ m/\Q[s|x|sh|ch|o]+V3SG/){
            $sent =~ s/\Q[s|x|sh|ch|o]+V3SG/es/g;
        }
        else {$sent =~ s/\Q+V3SG/s/g}
    }
}

foreach my $sent(@input){
    print $sent;
    print "\n";
}

Can anyone tell me what is wrong with the script?谁能告诉我脚本有什么问题?

The \\Q makes the rest of the regex match literally [dlr]y+V3SG . \\Q使正则表达式的其余部分逐字匹配[dlr]y+V3SG Moving it enables the character class to function properly:移动它可以使字符类正常运行:

s/[dlr]\Qy+V3SG/ies/g

or just escape the + :或者只是逃避+

s/[dlr]y\+V3SG/ies/g

After this change, you get, eg:在此更改后,您将获得,例如:

He stuies hard.

To make sure the first letter is retained, you can use a capture or \\K (since 5.10):要确保保留第一个字母,您可以使用捕获或\\K (自 5.10 起):

s/[dlr]\K\Qy+V3SG/ies/g

For the second regex, you're using the wrong brackets:对于第二个正则表达式,您使用了错误的括号:

s/(s|x|sh|ch|o)\Q+V3SG/$1es/g

You should keep \\Q just before the literal.您应该将\\Q保留在文字之前。 You are placing it before whole regex, so the whole regex is considered as literal and is not interpreted.您将它放在整个正则表达式之前,因此整个正则表达式被视为文字而不被解释。

Second thing you should use \\K wisely to substitute.第二件事你应该明智地使用\\K来替代。 Put it just after the part you don't want to substitute.将它放在您不想替换的部分之后。 for eg: s/[dlr]\\Ky\\Q+V3SG/ies/g makes study studies and it will not remove d or l or r from result.为例如: s/[dlr]\\Ky\\Q+V3SG/ies/g使得study studies ,它不会除去dlr从结果。

Third thing [s|x|sh|ch|o] will not do what you think.第三件事[s|x|sh|ch|o]不会按照你的想法去做。 It will match any character in s,x,h,|,c,o .它将匹配s,x,h,|,c,o中的任何字符。 The correct one should be (?:s|x|sh|ch|o) .正确的应该是(?:s|x|sh|ch|o) (?:...) is for non capturing group. (?:...)用于非捕获组。

Finally, that shouldn't be an if/elsif/else at all.最后,这根本不应该是 if/elsif/else。 The sentence could contain all three forms.句子可以包含所有三种形式。

Overall: It gives us:总的来说:它给了我们:

#!/usr/bin/perl
use strict;
use warnings;

my @input = ("He walk+V3SG very fast.", "He study+V3SG hard.","He crush+V3SG hard.");

foreach (@input){
    if (m/\Q+V3SG/){
        s/[dlr]\Ky\Q+V3SG/ies/g;
        s/(?:s|x|sh|ch|o)\K\Q+V3SG/es/g;
        s/\Q+V3SG/s/g;
    }
}

foreach my $sent(@input){
    print $sent;
    print "\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM