简体   繁体   English

如何使用 sed 查找和替换同一行多次出现的模式?

[英]How to use sed to search and replace a pattern who appears multiple times in the same line?

Because the question can be misleading, here is a little example.因为这个问题可能会产生误导,所以这里有一个小例子。 I have this kind of file:我有这种文件:

some text
some text @@some-text-KEY-some-other-text@@
text again @@some-text-KEY-some-other-text@@ @@some-text-KEY-some-other-text@@
again @@some-text-KEY-some-other-text-KEY-text@@
some text with KEY @@KEY-some-text@@
blabla @@KEY@@

In this example, I want to replace each occurrence of KEY- inside a pair of @@ by VALUE- .在此示例中,我想将一对@@中每次出现的KEY-替换为VALUE- I started with this sed command:我从这个 sed 命令开始:

sed -i 's/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g'

Here is how it works:下面是它的工作原理:

  1. \(@@[^@]*\) : create a first group composed of two @ and any characters except @ ... \(@@[^@]*\) :创建由两个@和除@之外的任何字符组成的第一组...
  2. KEY- : ... until the last occurrence of KEY- on that line KEY- : ... 直到该行最后一次出现KEY-
  3. \([^@]*@@\) : and create a second group with all the characters except @ until the next pair of @ . \([^@]*@@\) :并创建第二组,其中包含除@之外的所有字符,直到下一对@

The problem is my command can't handle correctly the following line because there are multiple KEY- inside my pair of @@ :问题是我的命令无法正确处理以下行,因为我的一对@@中有多个KEY-

again @@some-text-KEY-some-other-text-KEY-text@@

Indeed, I get this result:确实,我得到了这个结果:

again @@some-text-KEY-some-other-text-VALUE-text@@

If I want to replace all the occurrences of KEY- in that line, I have to run my command multiple times and I prefer to avoid that.如果我想替换该行中所有出现的KEY- ,我必须多次运行我的命令,我宁愿避免这种情况。 I also tried with lazy operators but the problem is the same.我也尝试过使用惰性运算符,但问题是一样的。

How can I create a regex and a sed command who can handle correctly all my file?如何创建可以正确处理我所有文件的正则表达式和 sed 命令?

The problem is rather complex: you need to replace all occurrences of some multicharacter text inside blocks of text between identical multicharacter delimiters.问题相当复杂:您需要替换相同多字符分隔符之间的文本块内出现的所有多字符文本。

The easiest and safest way to solve the task is using Perl:解决任务最简单、最安全的方法是使用 Perl:

perl -i -pe 's/(@@)(.*?)(@@)/$end_delim=$3; "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim"/ge' file

See the online demo .请参阅在线演示

The (@@)(.*?)(@@) pattern will match strings between two adjacent @@ substrings capturing the start delimiter into Group 1, end delimiter in Group 3, and all text in between into Group 2. Since the regex substitution re-sets all placeholders, the temporary variable is used to keep the value of the end delimiter ( $end_delim=$3 ), then, "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" (@@)(.*?)(@@)模式将匹配两个相邻@@子字符串之间的字符串,将起始分隔符捕获到第 1 组中,将结束分隔符捕获到第 3 组中,并将其间的所有文本捕获到第 2 组中。由于正则表达式替换重新设置所有占位符,临时变量用于保留结束分隔符的值( $end_delim=$3 ),然后是"$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim" replaces the match with the value in the Group 1 of the first match (the first @@ ), then the Group 2 value with all KEY- replaced with VALUE- , and then the end delimiter. "$1". $2=~s|KEY-|VALUE-|gr. "$end_delim"将匹配替换为第一个匹配的组 1 中的值(第一个@@ ),然后将所有KEY-替换为VALUE-的组 2 值,然后是结束分隔符。

If there are no KEY- s in between matches on the same line you may use a branch with sed by enclosing your command with :A and tA :如果在同一行的匹配之间没有KEY- s,您可以使用带有sed的分支,方法是用:AtA括起来您的命令:

sed -i ':A; s/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g; tA' file

Note you missed the first placeholder in \VALUE-\2 , it should be \1VALUE-\2 .请注意,您错过了\VALUE-\2中的第一个占位符,它应该是\1VALUE-\2

See the online demo :查看在线演示

s="some KEY- text
some text @@some-text-KEY-some-other-text@@
text again @@some-text-KEY-some-other-text@@ @@some-text-KEY-some-other-text@@
again @@some-text-KEY-some-other-text-KEY-text@@
some text with KEY @@KEY-some-text@@
blabla @@KEY@@"

sed ':A; s/\(@@[^@]*\)KEY-\([^@]*@@\)/\1VALUE-\2/g; tA' <<< "$s"

Output: Output:

some KEY- text
some text @@some-text-VALUE-some-other-text@@
text again @@some-text-VALUE-some-other-text@@ @@some-text-VALUE-some-other-text@@
again @@some-text-VALUE-some-other-text-VALUE-text@@
some text with KEY @@VALUE-some-text@@
blabla @@KEY@@

More details :更多详情

sed allows the usage of loops and branches . sed允许使用循环和分支 The :A in the code above is a label , a special location marker that can be "jumped at" using the appropriate operator.上面代码中的:Alabel ,一个特殊的位置标记,可以使用适当的运算符“跳转”。 t is used to create a branch, this " command jumps to the label only if the previous substitute command was successful ". t用于创建分支,此“命令仅在前一个替换命令成功时才跳转到 label ”。 So, once the pattern matched and the replacement occurred, sed goes back to where it was and re-tries a match.因此,一旦模式匹配并发生替换, sed就会回到原来的位置并重新尝试匹配。 If it is not successful, sed goes on to search for the matches further in the string.如果不成功, sed继续在字符串中进一步搜索匹配项。 So, tA means go back to the location marked with A if there was a successful search-and-replace operation .因此, tA表示go 回到标有A的位置,如果有成功的搜索和替换操作

This might work for you (GNU sed):这可能对您有用(GNU sed):

sed -E 's/@@/\n/g;:a;s/^([^\n]*(\n[^\n]*\n[^\n]*)*\n[^\n]*)KEY-/\1VALUE-/;ta;s/\n/@@/g' file

Convert @@ 's to newlines.@@转换为换行符。 Using a loop, replace VAL- between matched newlines to VALUE- .使用循环,将匹配的换行符之间的VAL-替换为VALUE- When all done replace newlines by @@ 's.全部完成后,用@@替换换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM