在bash / sed中，如何匹配小写字母后跟大写的相同字母？

Question

I want to delete all instances of "aA", "bB" ... "zZ" from an input string. 我想从输入字符串中删除“aA”，“bB”...“zZ”的所有实例。

eg 例如

echo "foObar" |
sed -Ee 's/([a-z])\U\1//'

should output "fbar" 应该输出"fbar"

But the \\U syntax works in the latter half (replacement part) of the sed expression - it fails to resolve in the matching clause. 但是\\ U语法在sed表达式的后半部分（替换部分）中起作用 - 它在匹配子句中无法解析。

I'm having difficulty converting the matched character to upper case to reuse in the matching clause. 我很难将匹配的字符转换为大写，以便在匹配子句中重用。

If anyone could suggest a working regex which can be used in sed (or awk) that would be great. 如果有人可以建议一个可以在sed（或awk）中使用的工作正则表达式，这将是伟大的。

Scripting solutions in pure shell are ok too (I'm trying to think of solving the problem this way). 纯shell中的脚本编写解决方案也可以（我正在考虑以这种方式解决问题）。

Working PCRE (Perl-compatible regular expressions) are ok too but I have no idea how they work so it might be nice if you could provide an explanation to go with your answer. 工作PCRE（与Perl兼容的正则表达式）也可以，但我不知道它们是如何工作的，所以如果你能提供一个解释来回答你的答案可能会很好。

Unfortunately, I don't have perl or python installed on the machine that I am working with. 不幸的是，我没有在我正在使用的机器上安装perl或python。

Answer 1

You may use the following perl solution: 您可以使用以下perl解决方案：

echo "foObar" | perl -pe 's/([a-z])(?!\1)(?i:\1)//g'

See the online demo . 请参阅在线演示。

Details 细节

([az]) - Group 1: a lowercase ASCII letter ([az]) - 组1：小写ASCII字母
(?!\\1) - a negative lookahead that fails the match if the next char is the same as captured with Group 1 (?!\\1) - 如果下一个字符与第1组捕获的字符相同，则表示匹配失败的否定前瞻
(?i:\\1) - the same char as captured with Group 1 but in the different case (due to the lookahead before it). (?i:\\1) - 与第1组捕获的相同的字符，但在不同的情况下（由于前面的前瞻）。

The -e option allows you to define Perl code to be executed by the compiler and the -p option always prints the contents of $_ each time around the loop. -e选项允许您定义要由编译器执行的Perl代码， -p选项始终在循环周围打印$_的内容。 See more here . 在这里查看更多。

Answer 2

This might work for you (GNU sed): 这可能适合你（GNU sed）：

sed -r 's/aA|bB|cC|dD|eE|fF|gG|hH|iI|jJ|kK|lL|mM|nN|oO|pP|qQ|rR|sS|tT|uU|vV|wW|xX|yY|zZ//g' file

A programmatic solution: 程序化解决方案：

sed 's/[[:lower:]][[:upper:]]/\n&/g;s/\n\(.\)\1//ig;s/\n//g' file

This marks all pairs of lower-case characters followed by an upper-case character with a preceding newline. 这标记了所有成对的小写字符，后跟带有前一个换行符的大写字符。 Then remove altogether such marker and pairs that match by a back reference irrespective of case. 然后完全删除这样的标记和通过后引用匹配的对，而不管情况如何。 Any other newlines are removed thus leaving pairs untouched that are not the same. 删除任何其他换行符，从而保持不变的对不同。

Answer 3

Here is a verbose awk solution as OP doesn't have perl or python available: 这是一个详细的awk解决方案，因为OP没有perl或python可用：

echo "foObar" |
awk -v ORS= -v FS='' '{
   for (i=2; i<=NF; i++) {
      if ($(i-1) == tolower($i) && $i ~ /[A-Z]/ && $(i-1) ~ /[a-z]/) {
         i++
         continue
      }
      print $(i-1)
   }
   print $(i-1)
}'

fbar

Answer 4

Note: This solution is (unsurprisingly) slow, based on OP's feedback: 注意：根据OP的反馈，这个解决方案（不出所料）很慢：
"Unfortunately, due to the multiple passes - it makes it rather slow. " “不幸的是，由于多次传球 - 这让它变得相当慢。”

If there is a character sequence¹ that you know won't ever appear in the input, 如果有一个你知道的字符序列¹不会出现在输入中，
you could use a 3-stage replacement to accomplish this with sed : 你可以使用一个3级置换来完成这个sed ：

 echo 'foObar foobAr' | sed -E -e 's/([az])([AZ])/KEYWORD\\1\\l\\2/g' -e 's/KEYWORD(.)\\1//g' -e 's/KEYWORD(.)(.)/\\1\\u\\2/g'

gives you: fbar foobAr 给你： fbar foobAr

Replacement stages explained: 替换阶段解释：

Look for lowercase letters followed by ANY uppercase letter and replace them with both letters as lowercase with the KEYWORD in front of them foObar foobAr -> fKEYWORDoobar fooKEYWORDbar 查找小写字母后跟任意大写字母，并将两个字母替换为小写字母，并在其前面加上KEYWORD foObar foobAr - > fKEYWORDoobar fooKEYWORDbar
Remove KEYWORD followed by two identical characters (both are lowercase now, so the back-reference works) fKEYWORDoobar fooKEYWORDbar -> fbar fooKEYWORDbar 删除KEYWORD后跟两个相同的字符（现在都是小写，所以后引用工作） fKEYWORDoobar fooKEYWORDbar - > fbar fooKEYWORDbar
Strip remaining² KEYWORD from the output and convert the second character after it back to it's original, uppercase version fbar fooKEYWORDbar -> fbar foobAr 从输出中fbar fooKEYWORDbar剩余的KEYWORD并将其后的第二个字符转换回它的原始大写版本fbar fooKEYWORDbar - > fbar foobAr

¹ _{In this example I used KEYWORD for demonstration purposes.} ¹ _{在这个例子中，我使用KEYWORD进行演示。} _{A single character or at least shorter character sequence would be better/faster.} _{单个字符或至少较短的字符序列会更好/更快。} _{Just make sure to pick something that cannot possibly ever be in the input.} _{只要确保选择一些不可能在输入中的东西。}
² _{The remaining occurances are those where the lowercase-versions of the letters were not identical, so we have to revert them back to their original state} ² _{剩余的出现是那些字母的小写版本不相同的，所以我们必须将它们恢复到原来的状态}

Answer 5

There's an easy lex for this, 这有一个简单的方法，

%option main 8bit
    #include <ctype.h>
%%
[[:lower:]][[:upper:]] if ( toupper(yytext[0]) != yytext[1] ) ECHO;

(that's a tab before the #include , markdown loses those). （这是#include之前的标签，降价失去了那些）。 Just put that in eg that.l and then make that . 只需将其放入例如that.l然后make that 。 Easy-peasy lex's are a nice addition to your toolkit. Easy-peasy lex是您工具包的一个很好的补充。

在bash / sed中，如何匹配小写字母后跟大写的相同字母？

问题描述

5 个解决方案

解决方案1
3 2018-12-11 17:18:05

解决方案2
3 2018-12-12 13:13:23

解决方案3
2 已采纳 2018-12-11 17:38:03

解决方案4
1 2018-12-11 17:51:30

解决方案5
1 2018-12-11 21:06:33

在bash / sed中，如何匹配小写字母后跟大写的相同字母？

问题描述

5 个解决方案

解决方案1 3 2018-12-11 17:18:05

解决方案2 3 2018-12-12 13:13:23

解决方案3 2 已采纳 2018-12-11 17:38:03

解决方案4 1 2018-12-11 17:51:30

解决方案5 1 2018-12-11 21:06:33

解决方案1
3 2018-12-11 17:18:05

解决方案2
3 2018-12-12 13:13:23

解决方案3
2 已采纳 2018-12-11 17:38:03

解决方案4
1 2018-12-11 17:51:30

解决方案5
1 2018-12-11 21:06:33