简体   繁体   English

Perl 正则表达式将每个字符捕获为一组

[英]Perl regex capture each character as one group

I have:我有:

Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

I need:我需要:

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t

Obviously this does not work, but I can't find a way to achieve this...显然这不起作用,但我找不到实现这一目标的方法......

's/(.*) >> (.)*/$1 >> \\$2/gm'

A quick and simple way is to apply a regex substitution inside your regex substitution.一种快速简单的方法是在您的正则表达式替换中应用正则表达式替换。

use strict;
use warnings;

while (<DATA>) {
    s/>> \K(.+)/ $1 =~ s#(.)#\\$1#gr /e;
    #            ^^^^^^^^^^^^^^^^^^^ inner substitution
    print;
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

The /e (eval) modifier tells Perl to evaluate the RHS as code. /e (eval) 修饰符告诉 Perl 将 RHS 评估为代码。 Note the use of alternative delimiters on the inner substitution operator s### , and the use of the /r modifier to return the value only (we can't modify a read-only variable anyway).请注意在内部替换运算符s###上使用替代分隔符,并使用/r修饰符仅返回值(无论如何我们都不能修改只读变量)。 The \K escape allows us to "keep" what is left of the regex match. \K转义允许我们“保留”正则表达式匹配的剩余内容。

This can be used as a simple one-liner:这可以用作简单的单线:

perl -pe's/>> \K(.+)/ $1 =~ s#(.)#\\$1#gr /e' yourfile.txt

One approach is to split the line in two, then apply the substitution only to the right side:一种方法是将行一分为二,然后仅将替换应用于右侧:

use warnings;
use strict;

while (<DATA>) {
    my ($x, $y) = split /\s+>>\s+/;
    $y =~ s/(.)/\\$1/g;
    print "$x >> $y";
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

Outputs:输出:

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t

You can match every single character after the first occurrence of >> and then use \K to clear what is matched so far in combination with \G to match every single character after it.您可以在第一次出现 >> 之后匹配每个单个字符,然后使用\K清除到目前为止匹配的内容,并结合\G匹配其后的每个单个字符。

(?:^.*?>>\h*|\G(?!^))\K.

Explanation解释

  • (?: Non capture group for the alternation (?:交替的非捕获组
    • ^.*?>>\h* Match until the first occurrence of >> followed by optional horizontal whitespace chars ^.*?>>\h*匹配直到第一次出现>>后跟可选的水平空白字符
    • | Or或者
    • \G(?!^) Assert the position at the end of the previous match, not at the start of the string \G(?!^)断言 position 在上一场比赛的结尾,而不是在字符串的开头
  • ) Close the non capture group )关闭非捕获组
  • \K Forget what is matched so far \K忘记到目前为止匹配的内容
  • . Match a single character匹配单个字符

See a regex demo or a perl demo .请参阅正则表达式演示perl 演示

In the replacement use the full match preceded by \在替换中使用以\开头的完整匹配

Example例子

use strict;
use warnings;

while (<DATA>) {
    s/(?:^.*?>>\h*|\G(?!^))\K./\\$&/g;
    print;
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

Output Output

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM