简体   繁体   中英

Why is this perl regex not working?

I have this array.

my @input = ("He walk+V3SG very fast.", "He study+V3SG hard.");

and I want to substitute 'walk+V3SG' and 'study+V3SG' to 'walks' and 'studies'.

Below is the script I wrote. I thought this should work but for some reason it is not working.

    foreach my $sent(@input){
    if ($sent =~ m/\Q+V3SG/){
        if ($sent =~ m/\Q[dlr]y+V3SG/){
            $sent =~ s/\Q[dlr]y+V3SG/ies/g;
        }
        if ($sent =~ m/\Q[s|x|sh|ch|o]+V3SG/){
            $sent =~ s/\Q[s|x|sh|ch|o]+V3SG/es/g;
        }
        else {$sent =~ s/\Q+V3SG/s/g}
    }
}

foreach my $sent(@input){
    print $sent;
    print "\n";
}

Can anyone tell me what is wrong with the script?

The \\Q makes the rest of the regex match literally [dlr]y+V3SG . Moving it enables the character class to function properly:

s/[dlr]\Qy+V3SG/ies/g

or just escape the + :

s/[dlr]y\+V3SG/ies/g

After this change, you get, eg:

He stuies hard.

To make sure the first letter is retained, you can use a capture or \\K (since 5.10):

s/[dlr]\K\Qy+V3SG/ies/g

For the second regex, you're using the wrong brackets:

s/(s|x|sh|ch|o)\Q+V3SG/$1es/g

You should keep \\Q just before the literal. You are placing it before whole regex, so the whole regex is considered as literal and is not interpreted.

Second thing you should use \\K wisely to substitute. Put it just after the part you don't want to substitute. for eg: s/[dlr]\\Ky\\Q+V3SG/ies/g makes study studies and it will not remove d or l or r from result.

Third thing [s|x|sh|ch|o] will not do what you think. It will match any character in s,x,h,|,c,o . The correct one should be (?:s|x|sh|ch|o) . (?:...) is for non capturing group.

Finally, that shouldn't be an if/elsif/else at all. The sentence could contain all three forms.

Overall: It gives us:

#!/usr/bin/perl
use strict;
use warnings;

my @input = ("He walk+V3SG very fast.", "He study+V3SG hard.","He crush+V3SG hard.");

foreach (@input){
    if (m/\Q+V3SG/){
        s/[dlr]\Ky\Q+V3SG/ies/g;
        s/(?:s|x|sh|ch|o)\K\Q+V3SG/es/g;
        s/\Q+V3SG/s/g;
    }
}

foreach my $sent(@input){
    print $sent;
    print "\n";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM