简体   繁体   中英

How to use sed to search and replace a pattern who appears multiple times in the same line?

Because the question can be misleading, here is a little example. I have this kind of file:

some text
some text @@some-text-KEY-some-other-text@@
text again @@some-text-KEY-some-other-text@@ @@some-text-KEY-some-other-text@@
again @@some-text-KEY-some-other-text-KEY-text@@
some text with KEY @@KEY-some-text@@
blabla @@KEY@@

In this example, I want to replace each occurrence of KEY- inside a pair of @@ by VALUE- . I started with this sed command:

sed -i 's/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g'

Here is how it works:

  1. \(@@[^@]*\) : create a first group composed of two @ and any characters except @ ...
  2. KEY- : ... until the last occurrence of KEY- on that line
  3. \([^@]*@@\) : and create a second group with all the characters except @ until the next pair of @ .

The problem is my command can't handle correctly the following line because there are multiple KEY- inside my pair of @@ :

again @@some-text-KEY-some-other-text-KEY-text@@

Indeed, I get this result:

again @@some-text-KEY-some-other-text-VALUE-text@@

If I want to replace all the occurrences of KEY- in that line, I have to run my command multiple times and I prefer to avoid that. I also tried with lazy operators but the problem is the same.

How can I create a regex and a sed command who can handle correctly all my file?

The problem is rather complex: you need to replace all occurrences of some multicharacter text inside blocks of text between identical multicharacter delimiters.

The easiest and safest way to solve the task is using Perl:

perl -i -pe 's/(@@)(.*?)(@@)/$end_delim=$3; "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim"/ge' file

See the online demo .

The (@@)(.*?)(@@) pattern will match strings between two adjacent @@ substrings capturing the start delimiter into Group 1, end delimiter in Group 3, and all text in between into Group 2. Since the regex substitution re-sets all placeholders, the temporary variable is used to keep the value of the end delimiter ( $end_delim=$3 ), then, "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" replaces the match with the value in the Group 1 of the first match (the first @@ ), then the Group 2 value with all KEY- replaced with VALUE- , and then the end delimiter.

If there are no KEY- s in between matches on the same line you may use a branch with sed by enclosing your command with :A and tA :

sed -i ':A; s/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g; tA' file

Note you missed the first placeholder in \VALUE-\2 , it should be \1VALUE-\2 .

See the online demo :

s="some KEY- text
some text @@some-text-KEY-some-other-text@@
text again @@some-text-KEY-some-other-text@@ @@some-text-KEY-some-other-text@@
again @@some-text-KEY-some-other-text-KEY-text@@
some text with KEY @@KEY-some-text@@
blabla @@KEY@@"

sed ':A; s/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g; tA' <<< "$s"

Output:

some KEY- text
some text @@some-text-VALUE-some-other-text@@
text again @@some-text-VALUE-some-other-text@@ @@some-text-VALUE-some-other-text@@
again @@some-text-VALUE-some-other-text-VALUE-text@@
some text with KEY @@VALUE-some-text@@
blabla @@KEY@@

More details :

sed allows the usage of loops and branches . The :A in the code above is a label , a special location marker that can be "jumped at" using the appropriate operator. t is used to create a branch, this " command jumps to the label only if the previous substitute command was successful ". So, once the pattern matched and the replacement occurred, sed goes back to where it was and re-tries a match. If it is not successful, sed goes on to search for the matches further in the string. So, tA means go back to the location marked with A if there was a successful search-and-replace operation .

This might work for you (GNU sed):

sed -E 's/@@/\n/g;:a;s/^([^\n]*(\n[^\n]*\n[^\n]*)*\n[^\n]*)KEY-/\1VALUE-/;ta;s/\n/@@/g' file

Convert @@ 's to newlines. Using a loop, replace VAL- between matched newlines to VALUE- . When all done replace newlines by @@ 's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM