简体   繁体   中英

How the regex substitution is working in perl?

I have tried the remove duplicates from the strings, "a","b","b","a","c" after removing the result is "a","b","c", . I have achieved this, but I have a doubt about working of regex substitution

use warnings;
use strict;
my $s = q+"a","b","b","a","c"+;

 $s=~s/ ("\w"),? / ($s=~s|($1)||g)?"$1,":"" /xge;
#^                   ^
#|                   Consider this as s2
#Consider this as s1

print "\n$s\n\n";

s1 value contain string as "a","b","b","a","c"

Step 1

After substitution:

Guess, what is the data contain s1 variable from the following "a","b","b","c" or "a","b","b","a","c" or ,"b","b",,"c" data.?

I have run the regex with eval grouping

$s=~s/ ("\w"),? (?{print "$s\n"})/ ($s=~s|($1)||g)?"$1,":"" /xge;

The result is

"a","b","b","a","c"
,"b","b",,"c"  #This is from after substitution
,,,,"c"
,,,,"c"
,,,,"c"

Now my dobut is s2 variable also $s why it is not concatenated with s1 , it means at the second step the result should be "a","b","b","c" (All the string "a" is replaced with empty and a is added in the $s ).?


Edited

The result from the eval grouping is (?{print $s})

"a","b","b","a","c"
,"b","b",,"c" 
,,,,"c"
,,,,"c"
,,,,"c"

After the substitution line I printed the $s variable it is giving "a","b","c" , How this output is coming.?

A regex is (in my opinion) the wrong tool to use here. I would

  • split the string on commas
  • remove duplicates from the list returned by split
  • join the list back into a string

Like this:

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

my $str = q["a","b","b","a","c"];

my %seen;

$str = join ',',
       grep { ! $seen{$_}++ }
       split /,/, $str;

say $str;

The proper solution to this is split, filter, rejoin as @Dave Cross has already demonstrated.

...

However, the following regex solution does work and hopefully demonstrates why Dave's solution is superior

#!/usr/bin/env perl

use v5.10;
use strict;
use warnings;

my $str = q{"a","b","b","a","c"};

1 while $str =~ s{
    \A
    (?: (?&element) , )*
    ( (?&element) )           # Capture in \1
    (?: , (?&element) )*
    \K
    ,
    \1                        # Remove the duplicate along with preceding comma
    (?= \z | , )

    (?(DEFINE)
        (?<element>
            "
            \w
            "
        )
    )
}{}xg;

say $str;

Outputs:

"a","b","c"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM