[英]sed - apply substitution between patterns
I have two patterns START
and END
and want to substitute every space with an underscore between these patterns. 我有两种模式
START
和END
并且想用这些模式之间的下划线替换每个空格。
Example 例
Lorem ipsum dolor START sit amet, consectetur END adipiscing elit.
should be transformed to 应该转化为
Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.
I know the regex for replace every space with an underscore 我知道用下划线替换每个空格的正则表达式
sed 's/ /_/g'
And I also know how to match the part between the two patterns 而且我也知道如何匹配两种模式之间的部分
sed 's/.*START\(.*\)END.*/\1/g'
But I have no idea how to combine these two things. 但是我不知道如何将这两件事结合起来。
As an alternative you may use Perl: 或者,您可以使用Perl:
perl -pe 's/(START.*?END)/$1=~s#\s#_#gr/ge'
The (START.*?END)
pattern matches a substring between START
and END
while capturing it into Group 1 and then s#\\s#_#gr
replaces each single whitespace ( \\s
) with _
in the contents of the group. 的
(START.*?END)
模式匹配之间的子START
和END
,同时捕捉到群组1,然后s#\\s#_#gr
替换每个单个空格( \\s
)与_
在组的内容。
Or, if you are using Perl that does not support the r
option: 或者,如果您使用的Perl不支持
r
选项:
perl -pe 's/(?:START|\G(?!^))(?:(?!END).)*?\K\s/_/g'
See the online demo and the second regex demo online . 在线观看在线演示和第二个regex演示 。
The (?:START|\\G(?!^))(?:(?!END).)*?\\K\\s
matches (?:START|\\G(?!^))(?:(?!END).)*?\\K\\s
匹配项
(?:START|\\G(?!^))
- START
substring or the end of the previous successful match (with \\G(?!^)
) (?:START|\\G(?!^))
- START
子字符串或上一个成功匹配的结尾(使用\\G(?!^)
) (?:(?!END).)*?
- any char but a line break char, not starting the END
substring, as few as possible END
子字符串开头,且越少越好 \\K
- a match reset operator discarding the previously matched text \\K
匹配重置运算符,丢弃先前匹配的文本 \\s
- a whitespace char. \\s
一个空白字符。 You may use this awk
to do your job: 您可以使用以下
awk
来完成工作:
awk -v ts='START ' -v te='END ' '{
while (n = index($0, ts)) {
m = index($0, te)
if (m > n) {
s = substr($0, n, m-n)
gsub(/[[:blank:]]+/, "_", s)
$0 = substr($0, 1, n-1) s substr($0, m)
}
}
} 1' file
Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.
Using GNU awk: 使用GNU awk:
awk -v RS='(START|END)' 'RT=="END"{gsub(" ","_")}{printf "%s%s",$0,RT}' file
This relies on the record separator RS
set to either START
or END
. 这取决于将记录分隔符
RS
设置为START
或END
。
If the END
tag is reached, the record is updated to replace spaces with underscores using the function gsub()
. 如果到达
END
标记,则使用gsub()
函数更新记录以用下划线替换空格。
The last statement prints the whole record including the record terminator RT
(matched with RS
). 最后一条语句打印整个记录,包括记录终止符
RT
(与RS
匹配)。
Note that this solution allows to have START
and END
across different lines (and necessary on the same line). 请注意,此解决方案允许在不同的行上具有
START
和END
(并且必须在同一行上)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.