简体   繁体   English

在bash / sed中,如何匹配小写字母后跟大写的相同字母?

[英]In bash/sed, how do you match on a lowercase letter followed by the SAME letter in uppercase?

I want to delete all instances of "aA", "bB" ... "zZ" from an input string. 我想从输入字符串中删除“aA”,“bB”...“zZ”的所有实例。

eg 例如

echo "foObar" |
sed -Ee 's/([a-z])\U\1//'

should output "fbar" 应该输出"fbar"

But the \\U syntax works in the latter half (replacement part) of the sed expression - it fails to resolve in the matching clause. 但是\\ U语法在sed表达式的后半部分(替换部分)中起作用 - 它在匹配子句中无法解析。

I'm having difficulty converting the matched character to upper case to reuse in the matching clause. 我很难将匹配的字符转换为大写,以便在匹配子句中重用。


If anyone could suggest a working regex which can be used in sed (or awk) that would be great. 如果有人可以建议一个可以在sed(或awk)中使用的工作正则表达式,这将是伟大的。

Scripting solutions in pure shell are ok too (I'm trying to think of solving the problem this way). 纯shell中的脚本编写解决方案也可以(我正在考虑以这种方式解决问题)。

Working PCRE (Perl-compatible regular expressions) are ok too but I have no idea how they work so it might be nice if you could provide an explanation to go with your answer. 工作PCRE(与Perl兼容的正则表达式)也可以,但我不知道它们是如何工作的,所以如果你能提供一个解释来回答你的答案可能会很好。

Unfortunately, I don't have perl or python installed on the machine that I am working with. 不幸的是,我没有在我正在使用的机器上安装perl或python。

You may use the following perl solution: 您可以使用以下perl解决方案:

echo "foObar" | perl -pe 's/([a-z])(?!\1)(?i:\1)//g'

See the online demo . 请参阅在线演示

Details 细节

  • ([az]) - Group 1: a lowercase ASCII letter ([az]) - 组1:小写ASCII字母
  • (?!\\1) - a negative lookahead that fails the match if the next char is the same as captured with Group 1 (?!\\1) - 如果下一个字符与第1组捕获的字符相同,则表示匹配失败的否定前瞻
  • (?i:\\1) - the same char as captured with Group 1 but in the different case (due to the lookahead before it). (?i:\\1) - 与第1组捕获的相同的字符,但在不同的情况下(由于前面的前瞻)。

The -e option allows you to define Perl code to be executed by the compiler and the -p option always prints the contents of $_ each time around the loop. -e选项允许您定义要由编译器执行的Perl代码, -p选项始终在循环周围打印$_的内容。 See more here . 在这里查看更多

This might work for you (GNU sed): 这可能适合你(GNU sed):

sed -r 's/aA|bB|cC|dD|eE|fF|gG|hH|iI|jJ|kK|lL|mM|nN|oO|pP|qQ|rR|sS|tT|uU|vV|wW|xX|yY|zZ//g' file

A programmatic solution: 程序化解决方案:

sed 's/[[:lower:]][[:upper:]]/\n&/g;s/\n\(.\)\1//ig;s/\n//g' file

This marks all pairs of lower-case characters followed by an upper-case character with a preceding newline. 这标记了所有成对的小写字符,后跟带有前一个换行符的大写字符。 Then remove altogether such marker and pairs that match by a back reference irrespective of case. 然后完全删除这样的标记和通过后引用匹配的对,而不管情况如何。 Any other newlines are removed thus leaving pairs untouched that are not the same. 删除任何其他换行符,从而保持不变的对不同。

Here is a verbose awk solution as OP doesn't have perl or python available: 这是一个详细的awk解决方案,因为OP没有perlpython可用:

echo "foObar" |
awk -v ORS= -v FS='' '{
   for (i=2; i<=NF; i++) {
      if ($(i-1) == tolower($i) && $i ~ /[A-Z]/ && $(i-1) ~ /[a-z]/) {
         i++
         continue
      }
      print $(i-1)
   }
   print $(i-1)
}'

fbar

Note: This solution is (unsurprisingly) slow, based on OP's feedback: 注意:根据OP的反馈,这个解决方案(不出所料)很慢:
"Unfortunately, due to the multiple passes - it makes it rather slow. " “不幸的是,由于多次传球 - 这让它变得相当慢。”


If there is a character sequence¹ that you know won't ever appear in the input, 如果有一个你知道的字符序列¹不会出现在输入中,
you could use a 3-stage replacement to accomplish this with sed : 你可以使用一个3级置换来完成这个sed

 echo 'foObar foobAr' | sed -E -e 's/([az])([AZ])/KEYWORD\\1\\l\\2/g' -e 's/KEYWORD(.)\\1//g' -e 's/KEYWORD(.)(.)/\\1\\u\\2/g' 

gives you: fbar foobAr 给你: fbar foobAr

Replacement stages explained: 替换阶段解释:

  • Look for lowercase letters followed by ANY uppercase letter and replace them with both letters as lowercase with the KEYWORD in front of them foObar foobAr -> fKEYWORDoobar fooKEYWORDbar 查找小写字母后跟任意大写字母,并将两个字母替换为小写字母,并在其前面加上KEYWORD foObar foobAr - > fKEYWORDoobar fooKEYWORDbar
  • Remove KEYWORD followed by two identical characters (both are lowercase now, so the back-reference works) fKEYWORDoobar fooKEYWORDbar -> fbar fooKEYWORDbar 删除KEYWORD后跟两个相同的字符(现在都是小写,所以后引用工作) fKEYWORDoobar fooKEYWORDbar - > fbar fooKEYWORDbar
  • Strip remaining² KEYWORD from the output and convert the second character after it back to it's original, uppercase version fbar fooKEYWORDbar -> fbar foobAr 从输出中fbar fooKEYWORDbar剩余的KEYWORD并将其后的第二个字符转换回它的原始大写版本fbar fooKEYWORDbar - > fbar foobAr

¹ In this example I used KEYWORD for demonstration purposes. ¹ 在这个例子中,我使用KEYWORD进行演示。 A single character or at least shorter character sequence would be better/faster. 单个字符或至少较短的字符序列会更好/更快。 Just make sure to pick something that cannot possibly ever be in the input. 只要确保选择一些不可能在输入中的东西。
² The remaining occurances are those where the lowercase-versions of the letters were not identical, so we have to revert them back to their original state ² 剩余的出现是那些字母的小写版本相同的,所以我们必须将它们恢复到原来的状态

There's an easy lex for this, 这有一个简单的方法,

%option main 8bit
    #include <ctype.h>
%%
[[:lower:]][[:upper:]] if ( toupper(yytext[0]) != yytext[1] ) ECHO;

(that's a tab before the #include , markdown loses those). (这是#include之前的标签,降价失去了那些)。 Just put that in eg that.l and then make that . 只需将其放入例如that.l然后make that Easy-peasy lex's are a nice addition to your toolkit. Easy-peasy lex是您工具包的一个很好的补充。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 小写字母或句点的正则表达式,后跟大写字母 - regex for lowercase letter or period followed by an uppercase letter 如何匹配任何大写字母后跟相应的小写字母? - How to match any uppercase letter followed by the corresponding lower case letter? 如果大写字母前面和后面跟着一个小写字母,则插入空格 - Python - Insert space if uppercase letter is preceded and followed by one lowercase letter - Python 在Java中设置小写字母和大写字母之间的时间间隔 - Set period between lowercase letter followed by uppercase letter in Java 如何使用正则表达式匹配包含至少一个大写字母但并非全部为小写的特定单词? - How do I use regex to match a specific word that contains at least one uppercase letter but not all in lowercase? 正则表达式查找HTML标记之间的小写字母和大写字母 - Regex to find a lowercase letter followed by an uppercase between a HTML tag 需要在小写字母的每个实例之间立即插入一个制表符,然后紧跟一个大写字母 - Need to insert a tab in between every instance of a lowercase letter immediately followed by an uppercase letter 删除大写字母前的小写字母 - Removing lowercase letter before an uppercase letter 下划线后跟小写字母 - Replacing underscores followed with lowercase letter 我如何匹配除某个 substring 之外不带字母的数字 - How do I match a number not followed by a letter except for a certain substring
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM