简体   繁体   English

在 bash 中替换任何单词开头和结尾的特定字符

[英]Replace a specific character at any word's begin and end in bash

I need to remove the hyphen '-' character only when it matches the pattern 'space-[AZ]' or '[AZ]-space'.仅当它与模式“space-[AZ]”或“[AZ]-space”匹配时,我才需要删除连字符“-”字符。 (Assuming all letters are uppercase, and space could be a space, or newline) (假设所有字母都是大写,空格可以是空格或换行符)

sample.txt样本.txt

I AM EMPTY-HANDED AND I- WA-
-ANT SOME COO- COOKIES

I want the output to be我希望输出是

I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

I've looked around for answers using sed and awk and perl, but I could only find answers relating to removing all characters between two patterns or specific strings, but not a specific character between [AZ] and space.我已经使用 sed 和 awk 以及 perl 四处寻找答案,但我只能找到与删除两个模式或特定字符串之间的所有字符有关的答案,而不是 [AZ] 和空格之间的特定字符。

Thanks heaps!!谢谢堆!!

If perl is your option, would you try the following:如果perl是您的选择,您会尝试以下操作吗:

perl -pe 's/(^|(?<=\s))-(?=[A-Z])//g; s/(?<=[A-Z])-((?=\s)|$)//g' sample.txt
  • (?<=\\s) is a zero-width lookbehind assertion which matches leading whitespace without including it in the matched substring. (?<=\\s)是一个零宽度的后视断言,它匹配前导空格而不将其包含在匹配的子字符串中。
  • (?=[AZ]) is a zero-width lookahead assertion which matches trailing character between A and Z without including it in the matched substring. (?=[AZ])是一个零宽度先行断言,它匹配 A 和 Z 之间的尾随字符,而不将其包含在匹配的子字符串中。
  • As a result, only the dash characters which match the pattern above are removed from the original text.结果,只有与上述模式匹配的破折号字符才会从原始文本中删除。
  • The second statement s/..//g is the flipped version of the first one.第二个语句s/..//g是第一个语句s/..//g翻转版本。

Could you please try following.你能不能试试以下。

awk '{for(i=1;i<=NF;i++){if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){sub(/-/,"",$i)}}} 1' Input_file

Adding a non-one liner form of solution:添加非单衬形式的溶液:

awk '
{
  for(i=1;i<=NF;i++){
    if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){
      sub(/-/,"",$i)
    }
  }
}
1
'  Input_file

Output will be as follows.输出如下。

I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

If you can provide Extended Regular Expressions to sed (generally with the -E or -r option), then you can shorten your sed expression to:如果您可以为sed提供扩展正则表达式(通常使用-E-r选项),那么您可以将sed表达式缩短为:

sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file

Where the basic form is sed -E 's/find1/replace1/g;s/find2/replace2/g' file which can also be written as separate expressions sed -E -e 's/find1/replace1/g' -e 's/find2/replace2/g' (your choice).基本形式是sed -E 's/find1/replace1/g;s/find2/replace2/g' file ,也可以写成单独的表达式sed -E -e 's/find1/replace1/g' -e 's/find2/replace2/g' (您的选择)。

The details of s/find1/replace1/g are: s/find1/replace1/g的详细信息是:

  • find1 is find1
    • (^|\\s) locate and capture at the beginning or whitespace, (^|\\s)定位并捕获开头或空格,
    • followed by the '-' hyphen,后跟'-'连字符,
    • then capture the next \\w (word-character);然后捕获下一个\\w (word-character); and
  • replace1 is simply \\1\\2 reinsert both captures with the first two backreferences. replace1只是\\1\\2使用前两个反向引用重新插入两个捕获。

The next substitution expression is similar, except now you are looking for the hyphen followed by a whitespace or at the end.下一个替换表达式是类似的,除了现在您要查找的是连字符后跟一个空格或末尾。 So you have:所以你有了:

  • find2 being find2
    • a capture of \\w (word-character), \\w (字字符)的捕获,
    • followed by the hyphen,后跟连字符,
    • followed by a capture of either a following space or the end (\\s|$) , then后跟捕获后续空格或结尾(\\s|$) ,然后
  • replace2 is the same as before, just reinsert the captured characters using backreferences. replace2和以前一样,只是使用反向引用重新插入捕获的字符。

In each case the g indicates a global replace of all occurrences.在每种情况下, g表示所有出现的全局替换。

( note: the \\w word-character also includes the '_' (underscore), so while unlikely you would have a hyphen and underscore together, if you do, you need to use the [A-Za-z] list instead of \\w ) 注意: \\w单词字符还包括'_' (下划线),因此虽然您不太可能将连字符和下划线放在一起,但如果您这样做,则需要使用[A-Za-z]列表而不是\\w )

Example Use/Output示例使用/输出

In your case, then output is:在你的情况下,输出是:

$ sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

remove the hyphen '-' character only when it matches the pattern 'space-[AZ]' or '[AZ]-space'.仅当它与模式 'space-[AZ]' 或 '[AZ]-space' 匹配时,才删除连字符 '-' 字符。 Assuming all letters are uppercase, and space could be a space, or newline假设所有字母都是大写,空格可以是空格或换行符

It's:它的:

sed 's/\( \|^\)-\([A-Z]\)/\1\2/g; s/\([A-Z]\)-\( \|$\)/\1\2/g'
  • s - substitute s - 替代
    • /
    • \\( \\|^\\) - space or beginning of the line \\( \\|^\\) - 空格或行首
    • - - hyphen... - - 连字符...
    • \\(AZ]\\) - a single upper case character \\(AZ]\\) - 单个大写字符
    • /
    • \\1\\2 - The \\1 is replaced by the first \\(...\\) thing. \\1\\2 - \\1被第一个\\(...\\)替换。 So it is replaced by a space or nothing.所以它被一个空格或什么都代替。 \\2 is replaced by the single upper case character found. \\2被找到的单个大写字符替换。 Effectively - is removed.有效-被删除。
    • /
    • g apply the regex globally g全局应用正则表达式
  • ; - separate two s commands - 将两个s命令分开
  • s
    • Same as above.和上面一样。 The $ means end of the line. $表示行尾。
awk '{sub(/ -/,"");sub(/^-|-$/,"");sub(/- /," ")}1' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM