简体   繁体   English

sed-在模式之间应用替换

[英]sed - apply substitution between patterns

I have two patterns START and END and want to substitute every space with an underscore between these patterns. 我有两种模式STARTEND并且想用这些模式之间的下划线替换每个空格。

Example

Lorem ipsum dolor START sit amet, consectetur END adipiscing elit.

should be transformed to 应该转化为

Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.

I know the regex for replace every space with an underscore 我知道用下划线替换每个空格的正则表达式

sed 's/ /_/g'

And I also know how to match the part between the two patterns 而且我也知道如何匹配两种模式之间的部分

sed 's/.*START\(.*\)END.*/\1/g'

But I have no idea how to combine these two things. 但是我不知道如何将这两件事结合起来。

As an alternative you may use Perl: 或者,您可以使用Perl:

perl -pe 's/(START.*?END)/$1=~s#\s#_#gr/ge'

The (START.*?END) pattern matches a substring between START and END while capturing it into Group 1 and then s#\\s#_#gr replaces each single whitespace ( \\s ) with _ in the contents of the group. (START.*?END)模式匹配之间的子STARTEND ,同时捕捉到群组1,然后s#\\s#_#gr替换每个单个空格( \\s )与_在组的内容。

Or, if you are using Perl that does not support the r option: 或者,如果您使用的Perl不支持r选项:

perl -pe 's/(?:START|\G(?!^))(?:(?!END).)*?\K\s/_/g'

See the online demo and the second regex demo online . 在线观看在线演示第二个regex演示

The (?:START|\\G(?!^))(?:(?!END).)*?\\K\\s matches (?:START|\\G(?!^))(?:(?!END).)*?\\K\\s匹配项

  • (?:START|\\G(?!^)) - START substring or the end of the previous successful match (with \\G(?!^) ) (?:START|\\G(?!^)) - START子字符串或上一个成功匹配的结尾(使用\\G(?!^)
  • (?:(?!END).)*? - any char but a line break char, not starting the END substring, as few as possible -除换行符以外的任何字符,不以END子字符串开头,且越少越好
  • \\K - a match reset operator discarding the previously matched text \\K匹配重置运算符,丢弃先前匹配的文本
  • \\s - a whitespace char. \\s一个空白字符。

You may use this awk to do your job: 您可以使用以下awk来完成工作:

awk -v ts='START ' -v te='END ' '{
   while (n = index($0, ts)) {
      m = index($0, te)
      if (m > n) {
         s = substr($0, n, m-n)
         gsub(/[[:blank:]]+/, "_", s)
         $0 = substr($0, 1, n-1) s substr($0, m)
      }
   }
} 1' file

Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.

Using GNU awk: 使用GNU awk:

awk -v RS='(START|END)' 'RT=="END"{gsub(" ","_")}{printf "%s%s",$0,RT}' file

This relies on the record separator RS set to either START or END . 这取决于将记录分隔符RS设置为STARTEND

If the END tag is reached, the record is updated to replace spaces with underscores using the function gsub() . 如果到达END标记,则使用gsub()函数更新记录以用下划线替换空格。

The last statement prints the whole record including the record terminator RT (matched with RS ). 最后一条语句打印整个记录,包括记录终止符RT (与RS匹配)。

Note that this solution allows to have START and END across different lines (and necessary on the same line). 请注意,此解决方案允许在不同的行上具有STARTEND (并且必须在同一行上)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM