简体   繁体   中英

Linux sed regex replace with capture groups

I have a file containing directory entries in the following format:

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>1</bw></item>

I would like to use sed to search for any where <ct> is an 11 digit number and where <bw>1</bw> . I would like to change the line above like so:

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>

(if it isn't obvious I have changed <bw> = 0)

I have tried the following in sed but it does not match:

sed -E 's/(.+<ct>\d{11}.+<bw>)1(<\/bw><\/item>)/\10\2/g' test-directory.xml

What am I doing wrong?

You may use this sed with 2 capture groups:

sed -E 's~(.*<ct>[0-9]{11}</ct>.*<bw>)1(</bw>.*)~\10\2~' file

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>

More Info:

  • (.*<ct>[0-9]{11}</ct>.*<bw>) : Match and capture any text followed by <ct>11-digits</ct> followed by any text followed by <bw> in capture group #1
  • 1 :
  • (</bw>.*) : Match </bw> followed by anything in capture group #2

PS: This assumes <ct> tag appears before <bw> tag in same line. For more refined control over XML better to use a XML parser instead of shell utilities.


If <bw> tag position is not fixed then you may use this sed solution:

sed -E '\~<ct>[0-9]{11}</ct>~ s~(.*<bw>)1(</bw>.*)~\10\2~' file

With awk (in case you are ok with it) you could try following GNU awk solution, written and tested in GNU awk with shown samples. Simple explanation would be, using match function of awk program where using regex (.*<ct>[0-9]{11}<\/ct>.*<bw>)([0-9]+)(<\/bw>.*) which creates 3 capturing group in it(to be used later on) and stores values of those as per capturing group number it will create index of items in array named arr . Once its done then printing only required part(changing any digits with 0 which is coming before </bw> ).

awk '
match($0,/(.*<ct>[0-9]{11}<\/ct>.*<bw>)([0-9]+)(<\/bw>.*)/,arr){
  print arr[1]"0"arr[3]
}
' Input_file

Here is the Online demo for above shown regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM