简体   繁体   中英

substitute text with equal length using sed

Is there a way to replace a pattern with equal length of somethings else (eg dots, zeros etc.) using sed? Like this:

maci:/ san$ echo "She sells sea shells by the sea shore" | sed 's/\(sh[a-z]*\)/../gI'
.. sells sea .. by the sea ..

( "I" requires a newer version of sed to ignore case )
This was easy: the word that starts with "sh" is replaced by double dots (..) but how do I make it something like this: ... sells sea ...... by the sea .....

Any idea? Cheers!

My suspicion is that you can't do it in standard sed , but you could do it with Perl or something else with more powerful regex handling.

$ echo "She sells sea shells by the sea shore" |
> perl -pe 's/(sh[a-z]*)/"." x length($1)/gei'
... sells sea ...... by the sea .....
$

The e modifier means that the replacement pattern is executable Perl script; in this case, it repeats the character . as many times as there are characters in the matched pattern. The g modifier repeats across the line; the i modifier is for case-insensitive matching. The -p option to Perl prints each line after the processing in the script specified by the -e option — the substitute command.

does this awk-oneliner do the job for you?

awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1' file

test with your data:

kent$  echo "She sells sea shells by the sea shore"|awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1'
... sells sea ...... by the sea .....

An old question, but I found a nice and reletively short one line sed solution:

sed ':a;s/\([Ss]h\.*\)[^\. ]/\1./;ta;s/[Ss]h/../g'

Works by replacing one character at a time in a loop.

:a; start a loop

s/\\([Ss]h\\.*\\)[^\\. ] s/\\([Ss]h\\.*\\)[^\\. ] search for an sh followed by any number of . s (our completed work so far) followed by a non dot or space character (what we're going to replace)

/\\1./; replace it by our completed work so far plus another . .

ta; if we made any substitution, loop, otherwise...

s/[Ss]h/../g replace the sh s with two . s and call it a day.

$ echo "She sells sea shells by the sea shore" |
awk '{
   head = ""
   tail = $0
   while ( match(tolower(tail),/sh[a-z]*/) ) {
      dots = sprintf("%*s",RLENGTH,"")
      gsub(/ /,".",dots)
      head = head substr(tail,1,RSTART-1) dots
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'
... sells sea ...... by the sea .....

As noted by others, sed is not well suited for this task. It is of course possible, here's one example that works on single lines with space separated words:

echo "She sells sea shells by the sea shore" |

sed 's/ /\n/g' | sed '/^[Ss]h/ s/[^[:punct:]]/./g' | sed ':a;N;$!ba;s/\n/ /g'

Output:

... sells sea ...... by the sea .....

The first 'sed' replaces spaces by newlines, the second does the dotting, the third removes newlines as shown in this answer .

If you have unpredictable word separators and/or paragraphs, this approach soon becomes unmanageable.

Edit - multi-line alternatives

Here's one way to handle multi-line input, inspired by Kent's comments (GNU sed):

echo "
She sells sea shells by the sea shore She sells sea shells by the sea shore,
She sells sea shells by the sea shore She sells sea shells by the sea shore
 She sells sea shells by the sea shore She sells sea shells by the sea shore
" |

# Add a \0 to the end of the line and surround punctuations and whitespace by \n 
sed 's/$/\x00/; s/[[:punct:][:space:]]/\n&\n/g' |

# Replace the matched word by dots
sed '/^[Ss]h.*/ s/[^\x00]/./g' | 

# Join lines that were separated by the first sed
sed ':a;/\x00/!{N;ba}; s/\n//g'

Output:

... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....,
... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
 ... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....

This might work for you (GNU sed):

sed -r ':a;/\b[Ss]h\S+/!b;s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/././g;G;s/(.*)\n(.*)\n.*\n/\2\1/;ta' file

In essence; it copies a word beginning with sh or Sh , replaces each character with . and then re-inserts the new string back into the original. When all occurences of the search string have been exhausted it prints out the line.

An alternative:

sed -E 's/\S+/\n&/g;s#.*#echo "&"|sed "/^sh/Is/\\S/./g"#e;s/\n//g' file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM