简体   繁体   中英

sed - apply substitution between patterns

I have two patterns START and END and want to substitute every space with an underscore between these patterns.

Example

Lorem ipsum dolor START sit amet, consectetur END adipiscing elit.

should be transformed to

Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.

I know the regex for replace every space with an underscore

sed 's/ /_/g'

And I also know how to match the part between the two patterns

sed 's/.*START\(.*\)END.*/\1/g'

But I have no idea how to combine these two things.

As an alternative you may use Perl:

perl -pe 's/(START.*?END)/$1=~s#\s#_#gr/ge'

The (START.*?END) pattern matches a substring between START and END while capturing it into Group 1 and then s#\\s#_#gr replaces each single whitespace ( \\s ) with _ in the contents of the group.

Or, if you are using Perl that does not support the r option:

perl -pe 's/(?:START|\G(?!^))(?:(?!END).)*?\K\s/_/g'

See the online demo and the second regex demo online .

The (?:START|\\G(?!^))(?:(?!END).)*?\\K\\s matches

  • (?:START|\\G(?!^)) - START substring or the end of the previous successful match (with \\G(?!^) )
  • (?:(?!END).)*? - any char but a line break char, not starting the END substring, as few as possible
  • \\K - a match reset operator discarding the previously matched text
  • \\s - a whitespace char.

You may use this awk to do your job:

awk -v ts='START ' -v te='END ' '{
   while (n = index($0, ts)) {
      m = index($0, te)
      if (m > n) {
         s = substr($0, n, m-n)
         gsub(/[[:blank:]]+/, "_", s)
         $0 = substr($0, 1, n-1) s substr($0, m)
      }
   }
} 1' file

Lorem ipsum dolor START_sit_amet,_consectetur_END adipiscing elit.

Using GNU awk:

awk -v RS='(START|END)' 'RT=="END"{gsub(" ","_")}{printf "%s%s",$0,RT}' file

This relies on the record separator RS set to either START or END .

If the END tag is reached, the record is updated to replace spaces with underscores using the function gsub() .

The last statement prints the whole record including the record terminator RT (matched with RS ).

Note that this solution allows to have START and END across different lines (and necessary on the same line).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM