I use perl regex capture groups to replace the pattern of a large number of files.
File example 1:
title="alpha" lorem ipsum lorem ipsum name="beta"
File example 2:
title="omega" Morbi posuere metus purus name="delta"
for
title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus
using
find . -type f -exec perl -pi -w -e 's/title="(?'one'.*?)"(?'three'.*?)name="(?'two'.*?)"/title="\g{two}"\g{three}/g;' \{\} \;
(Note that (1) attribute values of title and name are unknown variables and (2) the content between title="alpha"
and name="beta"
differs. )
I am still learning perl regex. What am I doing wrong? .
This perl
command line should work:
perl -pe 's/(title=)"?[^"\s]*"?(.*) name="?([^"\s]+)"?/$1"$3"$2/' file
title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus
Explanation:
(title=)
: Match title=
and capture in group #1 "?[^"\s]+"?
: Match a quoted non-space string (.*)
: Match 0 or more of any chars and capture in group #2 name="?
: Match name=
text followed by optional "
([^"\s]+)
: Match a quoted non-space string and capture in group #3 "?
: Optional "
$1"$3"$2
: Replacement part A bit of syntax: capture with (?<name>pattern)
and then use with $+{name}
(delimiters may be varied, see it in perlre ) outside of the pattern. The whole regex
s{ title="(?<t>[^"]+)" (?<text>.*?) name="(?<n>[^"]+)" }
{title="$+{n}"$+{text}}x
The \g{name}
syntax attempted in the question is used inside the pattern itself (if it is needed further in the same pattern in which it first gets captured); but after the matching side, so in the replacement side or after the regex, the matches can be retrieved from the %+
variable .
The [^"]
is a negated character-class , matching any character other than "
. The modifier /x
at the end makes it ignore literal spaces inside so we can use them for readability.
A full example, with the above regex, to run on the command line
echo title=\"alpha\" lorem ipsum lorem ipsum name=\"beta\" | perl -wpe
's{title="(?<t>[^"]+)"(?<text>.*?)name="(?<n>[^"]+)"}{title="$+{n}"$+{text}}'
(broken into two lines for readability). It prints
title="beta" lorem ipsum lorem ipsum
Not sure what the first one need be captured for, as in the question, but perhaps there is more to it than shown so it is captured here as well, into $+{t}
.
Also, the question uses those quotes rather loosely. One can string together '
-delimited strings for one command-line program but I'd suggest not to (if that was the intent).
1st solution: Since you are using find
command of shell, so in case you are ok with awk
code, here it goes, written and tested in GNU awk
.
Here is the Online demo for used regex in following code.
awk -v s1="\"" '
match($0,/(title=)"[^"]*" (.*)name="([^"]*)"/,arr){
print arr[1] s1 arr[3] s1,arr[2]
}
' Input_file
Explanation: Simple explanation here would be using GNU awk
's match
function; which allows us to use regex in it to find the required output. In here I am using regex (title=)"[^"]*" (.*)name="([^"]*)"
which is creating 3 capturing groups, whose values are getting stored into array named arr with index of ``1,2,3 with values of captured groups values. Then while printing the values I am printing them as per required output by OP.
2nd solution: In sed
with same regex and -E
(ERE) enabled option please try following code.
sed -E 's/^(title=)"[^"]*" (.*)name="([^"]*)"/\1"\3" \2/' Input_file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.