简体   繁体   English

使用 sed 替换长度相等的文本

[英]substitute text with equal length using sed

Is there a way to replace a pattern with equal length of somethings else (eg dots, zeros etc.) using sed?有没有办法使用 sed 替换具有相同长度的其他内容(例如点、零等)的模式? Like this:像这样:

maci:/ san$ echo "She sells sea shells by the sea shore" | sed 's/\(sh[a-z]*\)/../gI'
.. sells sea .. by the sea ..

( "I" requires a newer version of sed to ignore case ) “I”需要更新版本的 sed 才能忽略大小写
This was easy: the word that starts with "sh" is replaced by double dots (..) but how do I make it something like this: ... sells sea ...... by the sea .....这很简单:以“sh”开头的单词被双点 (..) 取代,但我如何使它成为这样的: ... sells sea ...... by the sea .....

Any idea?任何的想法? Cheers!干杯!

My suspicion is that you can't do it in standard sed , but you could do it with Perl or something else with more powerful regex handling.我怀疑你不能在标准sed做到这一点,但你可以用 Perl 或其他更强大的正则表达式处理来做到这一点。

$ echo "She sells sea shells by the sea shore" |
> perl -pe 's/(sh[a-z]*)/"." x length($1)/gei'
... sells sea ...... by the sea .....
$

The e modifier means that the replacement pattern is executable Perl script; e修饰符表示替换模式是可执行的 Perl 脚本; in this case, it repeats the character .在这种情况下,它会重复字符. as many times as there are characters in the matched pattern.与匹配模式中的字符一样多。 The g modifier repeats across the line; g修饰符在行中重复; the i modifier is for case-insensitive matching. i修饰符用于不区分大小写的匹配。 The -p option to Perl prints each line after the processing in the script specified by the -e option — the substitute command. Perl 的-p选项在-e选项指定的脚本中处理后打印每一行 - 替代命令。

does this awk-oneliner do the job for you?这个 awk-oneliner 能帮您完成这项工作吗?

awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1' file

test with your data:用你的数据测试:

kent$  echo "She sells sea shells by the sea shore"|awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1'
... sells sea ...... by the sea .....

An old question, but I found a nice and reletively short one line sed solution:一个老问题,但我发现了一个不错的、相对较短的单行 sed 解决方案:

sed ':a;s/\([Ss]h\.*\)[^\. ]/\1./;ta;s/[Ss]h/../g'

Works by replacing one character at a time in a loop.通过在循环中一次替换一个字符来工作。

:a; start a loop开始一个循环

s/\\([Ss]h\\.*\\)[^\\. ] s/\\([Ss]h\\.*\\)[^\\. ] search for an sh followed by any number of . s/\\([Ss]h\\.*\\)[^\\. ]搜索一个sh后跟任意数量的. s (our completed work so far) followed by a non dot or space character (what we're going to replace) s(到目前为止我们完成的工作)后跟一个非点或空格字符(我们将要替换的内容)

/\\1./; replace it by our completed work so far plus another .用我们迄今为止完成的工作加上另一个来代替它. . .

ta; if we made any substitution, loop, otherwise...如果我们做了任何替换,循环,否则......

s/[Ss]h/../g replace the sh s with two . s/[Ss]h/../gsh替换为两个. s and call it a day. s 并收工。

$ echo "She sells sea shells by the sea shore" |
awk '{
   head = ""
   tail = $0
   while ( match(tolower(tail),/sh[a-z]*/) ) {
      dots = sprintf("%*s",RLENGTH,"")
      gsub(/ /,".",dots)
      head = head substr(tail,1,RSTART-1) dots
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'
... sells sea ...... by the sea .....

As noted by others, sed is not well suited for this task.正如其他人所指出的,sed 不太适合这项任务。 It is of course possible, here's one example that works on single lines with space separated words:这当然是可能的,这里有一个例子,它适用于用空格分隔的单词的单行:

echo "She sells sea shells by the sea shore" |

sed 's/ /\n/g' | sed '/^[Ss]h/ s/[^[:punct:]]/./g' | sed ':a;N;$!ba;s/\n/ /g'

Output:输出:

... sells sea ...... by the sea .....

The first 'sed' replaces spaces by newlines, the second does the dotting, the third removes newlines as shown in this answer .第一个'sed'用换行符替换空格,第二个做点,第三个删除换行符,如本答案所示

If you have unpredictable word separators and/or paragraphs, this approach soon becomes unmanageable.如果您有不可预测的单词分隔符和/或段落,这种方法很快就会变得难以管理。

Edit - multi-line alternatives编辑 - 多行替代品

Here's one way to handle multi-line input, inspired by Kent's comments (GNU sed):这是处理多行输入的一种方法,灵感来自Kent 的评论 (GNU sed):

echo "
She sells sea shells by the sea shore She sells sea shells by the sea shore,
She sells sea shells by the sea shore She sells sea shells by the sea shore
 She sells sea shells by the sea shore She sells sea shells by the sea shore
" |

# Add a \0 to the end of the line and surround punctuations and whitespace by \n 
sed 's/$/\x00/; s/[[:punct:][:space:]]/\n&\n/g' |

# Replace the matched word by dots
sed '/^[Ss]h.*/ s/[^\x00]/./g' | 

# Join lines that were separated by the first sed
sed ':a;/\x00/!{N;ba}; s/\n//g'

Output:输出:

... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....,
... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
 ... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....

This might work for you (GNU sed):这可能对你有用(GNU sed):

sed -r ':a;/\b[Ss]h\S+/!b;s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/././g;G;s/(.*)\n(.*)\n.*\n/\2\1/;ta' file

In essence;在本质上; it copies a word beginning with sh or Sh , replaces each character with .它复制以shSh开头的单词,将每个字符替换为. and then re-inserts the new string back into the original.然后将新字符串重新插入原始字符串。 When all occurences of the search string have been exhausted it prints out the line.当搜索字符串的所有出现都用完时,它会打印出该行。

An alternative:替代:

sed -E 's/\S+/\n&/g;s#.*#echo "&"|sed "/^sh/Is/\\S/./g"#e;s/\n//g' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM