简体   繁体   English

Perl 正则表达式捕获组和重新洗牌模式

[英]Perl regex capture groups and reshuffle pattern

I use perl regex capture groups to replace the pattern of a large number of files.我使用 perl 正则表达式捕获组来替换大量文件的模式。

File example 1:文件示例 1:

title="alpha" lorem ipsum lorem ipsum name="beta"

File example 2:文件示例 2:

title="omega" Morbi posuere metus purus name="delta"

for为了

title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus

using使用

find . -type f -exec perl -pi -w -e 's/title="(?'one'.*?)"(?'three'.*?)name="(?'two'.*?)"/title="\g{two}"\g{three}/g;' \{\} \;

(Note that (1) attribute values of title and name are unknown variables and (2) the content between title="alpha" and name="beta" differs. ) (注意(1)title和name的属性值是未知变量,(2) title="alpha"name="beta"的内容不同。)

I am still learning perl regex.我还在学习 perl 正则表达式。 What am I doing wrong?我究竟做错了什么? . .

This perl command line should work:这个perl命令行应该可以工作:

perl -pe 's/(title=)"?[^"\s]*"?(.*) name="?([^"\s]+)"?/$1"$3"$2/' file

title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus

Explanation:解释:

  • (title=) : Match title= and capture in group #1 (title=) : 匹配title=并在组 #1 中捕获
  • "?[^"\s]+"? : Match a quoted non-space string "?[^"\s]+"? :匹配带引号的非空格字符串
  • (.*) : Match 0 or more of any chars and capture in group #2 (.*) :匹配 0 个或多个任意字符并在组 #2 中捕获
  • name="? : Match name= text followed by optional " name="? : 匹配name= text 后跟可选的"
  • ([^"\s]+) : Match a quoted non-space string and capture in group #3 ([^"\s]+) :匹配引用的非空格字符串并在组 #3 中捕获
  • "? : Optional " "? :可选"
  • $1"$3"$2 : Replacement part $1"$3"$2 : 替换零件

RegEx Demo正则表达式演示

A bit of syntax: capture with (?<name>pattern) and then use with $+{name} (delimiters may be varied, see it in perlre ) outside of the pattern.一些语法:使用(?<name>pattern)捕获,然后在模式之外使用$+{name} (分隔符可能会有所不同,请参见perlre )。 The whole regex整个正则表达式

s{ title="(?<t>[^"]+)" (?<text>.*?) name="(?<n>[^"]+)" }
 {title="$+{n}"$+{text}}x

The \g{name} syntax attempted in the question is used inside the pattern itself (if it is needed further in the same pattern in which it first gets captured);问题中尝试的\g{name}语法在模式本身内部使用(如果在首次捕获它的相同模式中进一步需要它); but after the matching side, so in the replacement side or after the regex, the matches can be retrieved from the %+ variable .但是在匹配端之后,因此在替换端或正则表达式之后,可以从%+变量中检索匹配项。

The [^"] is a negated character-class , matching any character other than " . [^"]是一个否定的字符类,匹配除"之外的任何字符。 The modifier /x at the end makes it ignore literal spaces inside so we can use them for readability.最后的修饰符/x使它忽略内部的文字空格,因此我们可以使用它们来提高可读性。

A full example, with the above regex, to run on the command line使用上述正则表达式的完整示例,可在命令行上运行

echo title=\"alpha\" lorem ipsum lorem ipsum name=\"beta\" | perl -wpe
's{title="(?<t>[^"]+)"(?<text>.*?)name="(?<n>[^"]+)"}{title="$+{n}"$+{text}}'

(broken into two lines for readability). (为了便于阅读,分成两行)。 It prints它打印

title="beta" lorem ipsum lorem ipsum 

Not sure what the first one need be captured for, as in the question, but perhaps there is more to it than shown so it is captured here as well, into $+{t} .不确定第一个需要捕获什么,如问题所示,但也许它比显示的更多,因此它也被捕获到$+{t}中。

Also, the question uses those quotes rather loosely.此外,该问题相当松散地使用了这些引号。 One can string together ' -delimited strings for one command-line program but I'd suggest not to (if that was the intent).一个命令行程序可以'分隔的字符串串在一起,但我建议不要这样做(如果这是意图的话)。

1st solution: Since you are using find command of shell, so in case you are ok with awk code, here it goes, written and tested in GNU awk .第一种解决方案:由于您使用的是 shell 的find命令,所以如果您对awk代码没问题,这里用 GNU awk编写和测试。

Here is the Online demo for used regex in following code.这是以下代码中使用的正则表达式的在线演示

awk -v s1="\"" '
match($0,/(title=)"[^"]*" (.*)name="([^"]*)"/,arr){
  print arr[1] s1 arr[3] s1,arr[2]
}
'  Input_file

Explanation: Simple explanation here would be using GNU awk 's match function;解释:这里的简单解释是使用 GNU awkmatch function; which allows us to use regex in it to find the required output.这允许我们在其中使用正则表达式来查找所需的 output。 In here I am using regex (title=)"[^"]*" (.*)name="([^"]*)" which is creating 3 capturing groups, whose values are getting stored into array named arr with index of ``1,2,3 with values of captured groups values.在这里,我使用正则表达式(title=)"[^"]*" (.*)name="([^"]*)"创建 3 个捕获组,其值存储到名为 arr 的数组中,并带有索引``1,2,3 与捕获的组值的值。 Then while printing the values I am printing them as per required output by OP.然后在打印值时,我按照 OP 的要求 output 打印它们。



2nd solution: In sed with same regex and -E (ERE) enabled option please try following code.第二种解决方案:sed中使用相同的正则表达式和-E (ERE) 启用选项,请尝试以下代码。

sed -E 's/^(title=)"[^"]*" (.*)name="([^"]*)"/\1"\3" \2/' Input_file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM