简体   繁体   English

Perl:如何在正则表达式中使用字符串变量作为搜索模式和替换

[英]Perl: how to use string variables as search pattern and replacement in regex

I want to use string variables for both search pattern and replacement in regex. 我想在正则表达式中使用字符串变量进行搜索模式和替换。 The expected output is like this, 预期的输出是这样的,

$ perl -e '$a="abcdeabCde"; $a=~s/b(.)d/_$1$1_/g; print "$a\n"'
a_cc_ea_CC_e

But when I moved the pattern and replacement to a variable, $1 was not evaluated. 但是当我将模式和替换移动到变量时, $1没有被评估。

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/g; print "$a\n"'
a_$1$1_ea_$1$1_e

When I use "ee" modifier, it gives errors. 当我使用“ee”修饰符时,它会出错。

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/gee; print "$a\n"'
Scalar found where operator expected at (eval 1) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 1) line 1, near "$1_"
    (Missing operator before _?)
Scalar found where operator expected at (eval 2) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 2) line 1, near "$1_"
    (Missing operator before _?)
aeae

What do I miss here? 我在这里想念什么?


Edit 编辑

Both $p and $r are written by myself. $p$r都是我自己写的。 What I need is to do multiple similar regex replacing without touching the perl code, so $p and $r have to be in a separate data file. 我需要的是在不触及perl代码的情况下进行多个类似的正则表达式替换,因此$p$r必须位于单独的数据文件中。 I hope this file can be used with C++/python code later. 我希望以后可以将此文件与C ++ / python代码一起使用。 Here are some examples of $p and $r . 以下是$p$r一些示例。

^(.*\D)?((19|18|20)\d\d)年   $1$2<digits>年
^(.*\D)?(0\d)年  $1$2<digits>年
([TKZGD])(\d+)/(\d+)([^\d/])    $1$2<digits>$3<digits>$4
([^/TKZGD\d])(\d+)/(\d+)([^/\d])    $1$3分之$2$4

With $p="b(.)d"; 使用$p="b(.)d"; you are getting a string with literal characters b(.)d . 你得到一个带有文字字符b(.)d的字符串。 In general, regex patterns are not preserved in quoted strings and may not have their expected meaning in a regex. 通常,正则表达式模式不会保留在带引号的字符串中,并且可能在正则表达式中没有它们的预期含义。 However, see Note at the end. 但是,请参见最后的注释

This is what qr operator is for: $p = qr/b(.)d/; 这就是qr运算符的用途: $p = qr/b(.)d/; forms the string as a regular expression. 将字符串形成为正则表达式。

As for the replacement part and /ee , the problem is that $r is first evaluated, to yield _$1$1_ , which is then evaluated as code. 至于替换部分和/ee ,问题是首先评估$r ,以产生_$1$1_ ,然后将其作为代码进行评估。 Alas, that is not valid Perl code. 唉,这不是有效的Perl代码。 The _ are barewords and even $1$1 itself isn't valid (for example, $1 . $1 would be). _是裸字,甚至$1$1本身也无效(例如, $1 . $1将是)。

The provided examples of $r have $N s mixed with text in various ways. 提供的$r示例有$N s以各种方式混合文本。 One way to parse this is to extract all $N and all else into a list that maintains their order from the string. 解析这一问题的一种方法是将所有$N和所有其他内容提取到一个列表中,该列表从字符串维护其顺序。 Then, that can be processed into a string that will be valid code. 然后,可以将其处理为将成为有效代码的字符串。 For example, we need 例如,我们需要

'$1_$2$3other'  -->  $1 . '_' . $2 . $3 . 'other'

which is valid Perl code that can be evaluated. 这是可以评估的有效Perl代码。

The part of breaking this up is helped by split 's capturing in the separator pattern. 分裂模式中分裂的捕获有助于打破这一点。

sub repl {
    my ($r) = @_;

    my @terms = grep { $_ } split /(\$\d)/, $r;

    return join '.', map { /^\$/ ? $_ : q(') . $_ . q(') } @terms;
}

$var =~ s/$p/repl($r)/gee;

With capturing /(...)/ in split 's pattern, the separators are returned as a part of the list. 通过在split的模式中捕获/(...)/ ,分隔符将作为列表的一部分返回。 Thus this extracts from $r an array of terms which are either $N or other, in their original order and with everything (other than trailing whitespace) kept. 因此,这从$r提取$r一个术语数组,这些术语是$N或其他,以它们的原始顺序并且保留了所有内容(除了尾随空格)。 This includes possible (leading) empty strings so those need be filtered out. 这包括可能的(前导)空字符串,因此需要将其过滤掉。

Then every term other than $N s is wrapped in '...' , so when they are all joined by . 然后除了$N s之外的每个术语都包含在'...' ,所以当它们全部加入时. we get a valid Perl expression, as in the example above. 我们得到一个有效的Perl表达式,如上例所示。

Then /ee will have this function return the string (such as above), and evaluate it as valid code. 然后/ee将使此函数返回字符串(如上所述),并将其评估为有效代码。

We are told that safety of using /ee on external input is not a concern here. 我们被告知在这里不需要考虑外部输入上使用/ee安全性。 Still, this is something to keep in mind. 不过,这是值得记住的。 See this post , provided by Håkon Hægland in a comment. 请参阅HåkonHægland在评论中提供的这篇文章 Along with the discussion it also directs us to String::Substitution . 除了讨论,它还指导我们String :: Substitution Its use is demonstrated in this post . 它的用途在这篇文章中得到了证明。 Another way to approach this is with replace from Data::Munge 另一种方法是使用Data :: Munge进行replace

For more discussion of /ee see this post , with several useful answers. 有关/ee更多讨论,请参阅此文章 ,其中包含几个有用的答案。


Note on using "b(.)d" for a regex pattern 关于使用"b(.)d"表示正则表达式的注意事项

In this case, with parens and dot, their special meaning is maintained. 在这种情况下,使用parens和dot,可以保持其特殊含义。 Thanks to kangshiyin for an early mention of this, and to Håkon Hægland for asserting it. 感谢kangshiyin早期提到这个,并感谢HåkonHægland断言它。 However, this is a special case. 但是,这是一个特例。 Double-quoted strings directly deny many patterns since interpolation is done -- for example, "\\w" is just an escaped w (what is unrecognized). 由于插值已完成,双引号字符串直接拒绝许多模式 - 例如, "\\w"只是一个转义w (无法识别的内容)。 The single quotes should work, as there is no interpolation. 引号应该有效,因为没有插值。 Still, strings intended for use as regex patterns are best formed using qr , as we are getting a true regex. 尽管如此,用作正则表达式模式的字符串最好用qr形成,因为我们得到了一个真正的正则表达式。 Then all modifiers may be used as well. 然后也可以使用所有修饰符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM