简体   繁体   English

使用sed替换不在行尾的开头的模式

[英]Use sed to replace patterns that are not at the start of end of lines

Let's say I have input: 假设我输入了:

/a/b/c/d/e/
/a/b/c/d/e
a/b/c/d/e/
a/b/c/d/e

I'd like to replace all / that are not at the edges with + so the output is: 我想用+代替不在边缘的所有/ ,所以输出为:

/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e

I've tried this command: 我已经尝试过以下命令:

sed -e "s#\(.\)/\(.\)#\1+\2#g"

which is close but not quite: 这很接近但不完全是:

/a+b/c+d/e/
/a+b/c+d/e
a+b/c+d/e/
a+b/c+d/e

presumably because the \\(.\\) overlap between successive / characters. 大概是因为\\(.\\)在连续的/字符之间重叠。

I don't believe sed has a null match operator for beginning or end of line. 我不认为sed在行首或行尾有空匹配运算符。 So, how is this done? 那么,这是怎么做的呢?

You can translate all slashes to + and then replace + (at the beginning or at the end) with a slash: 您可以将所有斜杠转换为+ ,然后用斜杠替换+(在开头或结尾):

sed 'y/\//+/;s/^+\|+$/\//g;'

or if the OR operator isn't available: 或如果OR运算符不可用:

sed 'y/\//+/;s/^+/\//;s/+$/\//;'

better if you change the delimiter to avoid to escape all literal slashes: 如果更改定界符以避免转义所有文字斜杠,则更好:

sed 'y~/~+~;s~^+\|+$~/~g;'

or if the OR operator isn't available: 或如果OR运算符不可用:

sed 'y~/~+~;s~^+~/~;s~+$~/~;'

(where ^ is an anchor for the start of the line and $ for the end) (其中^是该行开头的锚点, $是该行的结尾)


Other way: you can protect the slashes you want to preserve using a placeholder: 其他方式:您可以使用占位符保护要保留的斜杠:

sed 's~^/~{`%{~;s~/$~{`%{~;y~/~+~;s~{`%{~/~g;'

If you have perl you can use lookarounds for this: 如果您有perl ,则可以使用环视方法:

perl -pe 's~(?<!^)/(?!$)~+~g' file

Output: 输出:

/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e

Otherwise you can use this sed with 2 substitutes: 否则,您可以将此sed与2个替代品一起使用:

sed -r 's~(.)/(.)~\1+\2~g; s~(.)/(.)~\1+\2~g' file

Or this sed with labeling and looping: 或者用标签和循环来实现:

sed -r ':a;s|(.)/(.)|\1+\2|g;ta' file

Here is a sed command that gives your output: 这是一个sed命令,可提供您的输出:

sed -r 's=(.)/\b=\1+=g;' file
  • usually / is uses as separator for the s command, but here we use = 通常/用作s命令的分隔符,但这里我们使用=
  • the / is matched where there is something ( . ) before it and and we are at a word boundary /匹配在前面有( . )且我们位于单词边界的地方
  • initially I tried (.)/(.) but that did not work: 最初我尝试了(.)/(.)但是没有用:
    • The second dot was consumed and the next match would only start after it, 第二个点被消耗了,下一场比赛只会在第二个点之后开始,
    • ie in x/y/< the second match would only see /z and not y/z 即在x/y/< ,第二个匹配项只会显示/z而不是y/z
    • with \\b the first match does not consume the y and the second match sees y/ 使用\\b ,第一个匹配项不会消耗y ,第二个匹配项会看到y/

This is the common and extremely useful sed idiom for doing jobs like this: 这是完成以下工作的常见且极为有用的sed习惯用法:

$ sed 's:a:aA:g; s:^/\|/$:aB:g; s:/:+:g; s:aB:/:g; s:aA:a:g' file
/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e

The 1st sub changes all a s to aA . 第一个子将所有a更改为aA At that point there is no letter a in the input that is not followed by the letter A (we need to do this first to ensure that after our 2nd sub the only aB s in the input are as a result of that 2nd sub) 在这一点上,输入中没有字母a ,后跟字母A (我们需要首先执行此操作,以确保在第二个子之后,输入中仅有的aB是该第二个子的结果)

The 2nd sub changes all / s at the start or end of a line to aB . 第二个子句将行的开头或结尾的全部/ s更改为aB At that point the only aB s in the input are where there were originally / s at the start or end of the line. 在那一点上,输入中唯一的aB是行的开始或结尾处最初存在/ s的位置。

The 3rd sub changes all remaining / s (ie those that were not at the start or end of the line) to + s. 第3个子项将所有剩余的/ s(即不在行首或末尾的/ s)更改为+ s。

The 4th sub restores the aB s back to the original front/end / s. 的第四子恢复aB的背部到原来的前/结束/秒。

The 5th sub restores the aA s back to the original a s. 第五个子将aA s恢复为原始a s。

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed ':a;s/\([^\/]\)\/\([^\/]\)/\1+\2/g;ta' file

Or visually easier: 或在视觉上更容易:

sed -r ':a;s#([^/])/([^/])#\1+\2#g;ta' file

It is really the same regexp twice: 两次确实是相同的正则表达式:

sed 's/\([^\/]\)\/\([^\/]\)/\1+\2/g;s/\([^\/]\)\/\([^\/]\)/\1+\2/g' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM