简体   繁体   English

正则表达式替换在perl中如何工作?

[英]How the regex substitution is working in perl?

I have tried the remove duplicates from the strings, "a","b","b","a","c" after removing the result is "a","b","c", . 在删除结果为"a","b","c",之后"a","b","b","a","c"我尝试了从字符串"a","b","b","a","c"删除重复项。 I have achieved this, but I have a doubt about working of regex substitution 我已经做到了,但是我对正则表达式替换的工作有疑问

use warnings;
use strict;
my $s = q+"a","b","b","a","c"+;

 $s=~s/ ("\w"),? / ($s=~s|($1)||g)?"$1,":"" /xge;
#^                   ^
#|                   Consider this as s2
#Consider this as s1

print "\n$s\n\n";

s1 value contain string as "a","b","b","a","c" s1值包含字符串"a","b","b","a","c"

Step 1 第1步

After substitution: 替换后:

Guess, what is the data contain s1 variable from the following "a","b","b","c" or "a","b","b","a","c" or ,"b","b",,"c" data.? 猜猜是什么数据包含来自以下"a","b","b","c""a","b","b","a","c","b","b",,"c" s1变量,"b","b",,"c"数据。

I have run the regex with eval grouping 我已经通过评估分组运行了正则表达式

$s=~s/ ("\w"),? (?{print "$s\n"})/ ($s=~s|($1)||g)?"$1,":"" /xge;

The result is 结果是

"a","b","b","a","c"
,"b","b",,"c"  #This is from after substitution
,,,,"c"
,,,,"c"
,,,,"c"

Now my dobut is s2 variable also $s why it is not concatenated with s1 , it means at the second step the result should be "a","b","b","c" (All the string "a" is replaced with empty and a is added in the $s ).? 现在我的dobut是s2变量,也就是$s为什么不与s1连接,这意味着在第二步结果应该是"a","b","b","c" (所有字符串"a"是替换为空,并在$s添加a )。


Edited 已编辑

The result from the eval grouping is (?{print $s}) 评估分组的结果是(?{print $s})

"a","b","b","a","c"
,"b","b",,"c" 
,,,,"c"
,,,,"c"
,,,,"c"

After the substitution line I printed the $s variable it is giving "a","b","c" , How this output is coming.? 在替换行之后,我打印了$s变量,它给出的是"a","b","c" ,输出结果如何?

A regex is (in my opinion) the wrong tool to use here. 正则表达式(在我看来)是在此使用的错误工具。 I would 我会

  • split the string on commas 用逗号split字符串
  • remove duplicates from the list returned by split split返回的列表中删除重复项
  • join the list back into a string join列表回字符串

Like this: 像这样:

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

my $str = q["a","b","b","a","c"];

my %seen;

$str = join ',',
       grep { ! $seen{$_}++ }
       split /,/, $str;

say $str;

The proper solution to this is split, filter, rejoin as @Dave Cross has already demonstrated. 正确的解决方案是拆分,过滤,重新加入,如@Dave Cross所示。

... ...

However, the following regex solution does work and hopefully demonstrates why Dave's solution is superior 但是,以下正则表达式解决方案确实有效,并有望说明Dave解决方案为何优越

#!/usr/bin/env perl

use v5.10;
use strict;
use warnings;

my $str = q{"a","b","b","a","c"};

1 while $str =~ s{
    \A
    (?: (?&element) , )*
    ( (?&element) )           # Capture in \1
    (?: , (?&element) )*
    \K
    ,
    \1                        # Remove the duplicate along with preceding comma
    (?= \z | , )

    (?(DEFINE)
        (?<element>
            "
            \w
            "
        )
    )
}{}xg;

say $str;

Outputs: 输出:

"a","b","c"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM