[英]In bash/sed, how do you match on a lowercase letter followed by the SAME letter in uppercase?
I want to delete all instances of "aA", "bB" ... "zZ" from an input string. 我想从输入字符串中删除“aA”,“bB”...“zZ”的所有实例。
eg 例如
echo "foObar" |
sed -Ee 's/([a-z])\U\1//'
should output "fbar"
应该输出"fbar"
But the \\U syntax works in the latter half (replacement part) of the sed expression - it fails to resolve in the matching clause. 但是\\ U语法在sed表达式的后半部分(替换部分)中起作用 - 它在匹配子句中无法解析。
I'm having difficulty converting the matched character to upper case to reuse in the matching clause. 我很难将匹配的字符转换为大写,以便在匹配子句中重用。
If anyone could suggest a working regex which can be used in sed (or awk) that would be great. 如果有人可以建议一个可以在sed(或awk)中使用的工作正则表达式,这将是伟大的。
Scripting solutions in pure shell are ok too (I'm trying to think of solving the problem this way). 纯shell中的脚本编写解决方案也可以(我正在考虑以这种方式解决问题)。
Working PCRE (Perl-compatible regular expressions) are ok too but I have no idea how they work so it might be nice if you could provide an explanation to go with your answer. 工作PCRE(与Perl兼容的正则表达式)也可以,但我不知道它们是如何工作的,所以如果你能提供一个解释来回答你的答案可能会很好。
Unfortunately, I don't have perl or python installed on the machine that I am working with. 不幸的是,我没有在我正在使用的机器上安装perl或python。
You may use the following perl solution: 您可以使用以下perl解决方案:
echo "foObar" | perl -pe 's/([a-z])(?!\1)(?i:\1)//g'
See the online demo . 请参阅在线演示 。
Details 细节
([az])
- Group 1: a lowercase ASCII letter ([az])
- 组1:小写ASCII字母 (?!\\1)
- a negative lookahead that fails the match if the next char is the same as captured with Group 1 (?!\\1)
- 如果下一个字符与第1组捕获的字符相同,则表示匹配失败的否定前瞻 (?i:\\1)
- the same char as captured with Group 1 but in the different case (due to the lookahead before it). (?i:\\1)
- 与第1组捕获的相同的字符,但在不同的情况下(由于前面的前瞻)。 The -e
option allows you to define Perl code to be executed by the compiler and the -p
option always prints the contents of $_
each time around the loop. -e
选项允许您定义要由编译器执行的Perl代码, -p
选项始终在循环周围打印$_
的内容。 See more here . 在这里查看更多 。
This might work for you (GNU sed): 这可能适合你(GNU sed):
sed -r 's/aA|bB|cC|dD|eE|fF|gG|hH|iI|jJ|kK|lL|mM|nN|oO|pP|qQ|rR|sS|tT|uU|vV|wW|xX|yY|zZ//g' file
A programmatic solution: 程序化解决方案:
sed 's/[[:lower:]][[:upper:]]/\n&/g;s/\n\(.\)\1//ig;s/\n//g' file
This marks all pairs of lower-case characters followed by an upper-case character with a preceding newline. 这标记了所有成对的小写字符,后跟带有前一个换行符的大写字符。 Then remove altogether such marker and pairs that match by a back reference irrespective of case. 然后完全删除这样的标记和通过后引用匹配的对,而不管情况如何。 Any other newlines are removed thus leaving pairs untouched that are not the same. 删除任何其他换行符,从而保持不变的对不同。
Here is a verbose awk
solution as OP doesn't have perl
or python
available: 这是一个详细的awk
解决方案,因为OP没有perl
或python
可用:
echo "foObar" |
awk -v ORS= -v FS='' '{
for (i=2; i<=NF; i++) {
if ($(i-1) == tolower($i) && $i ~ /[A-Z]/ && $(i-1) ~ /[a-z]/) {
i++
continue
}
print $(i-1)
}
print $(i-1)
}'
fbar
Note: This solution is (unsurprisingly) slow, based on OP's feedback: 注意:根据OP的反馈,这个解决方案(不出所料)很慢:
"Unfortunately, due to the multiple passes - it makes it rather slow. " “不幸的是,由于多次传球 - 这让它变得相当慢。”
sed
:
你可以使用一个3级置换来完成这个sed
:
echo 'foObar foobAr' | sed -E -e 's/([az])([AZ])/KEYWORD\\1\\l\\2/g' -e 's/KEYWORD(.)\\1//g' -e 's/KEYWORD(.)(.)/\\1\\u\\2/g'
gives you: fbar foobAr
给你: fbar foobAr
Replacement stages explained: 替换阶段解释:
foObar foobAr
-> fKEYWORDoobar fooKEYWORDbar
查找小写字母后跟任意大写字母,并将两个字母替换为小写字母,并在其前面加上KEYWORD foObar foobAr
- > fKEYWORDoobar fooKEYWORDbar
fKEYWORDoobar fooKEYWORDbar
-> fbar fooKEYWORDbar
删除KEYWORD后跟两个相同的字符(现在都是小写,所以后引用工作) fKEYWORDoobar fooKEYWORDbar
- > fbar fooKEYWORDbar
fbar fooKEYWORDbar
-> fbar foobAr
从输出中fbar fooKEYWORDbar
剩余的KEYWORD并将其后的第二个字符转换回它的原始大写版本fbar fooKEYWORDbar
- > fbar foobAr
¹ In this example I used KEYWORD
for demonstration purposes. ¹ 在这个例子中,我使用KEYWORD
进行演示。 A single character or at least shorter character sequence would be better/faster. 单个字符或至少较短的字符序列会更好/更快。 Just make sure to pick something that cannot possibly ever be in the input. 只要确保选择一些不可能在输入中的东西。
² The remaining occurances are those where the lowercase-versions of the letters were not identical, so we have to revert them back to their original state ² 剩余的出现是那些字母的小写版本不相同的,所以我们必须将它们恢复到原来的状态
There's an easy lex for this, 这有一个简单的方法,
%option main 8bit
#include <ctype.h>
%%
[[:lower:]][[:upper:]] if ( toupper(yytext[0]) != yytext[1] ) ECHO;
(that's a tab before the #include
, markdown loses those). (这是#include
之前的标签,降价失去了那些)。 Just put that in eg that.l
and then make that
. 只需将其放入例如that.l
然后make that
。 Easy-peasy lex's are a nice addition to your toolkit. Easy-peasy lex是您工具包的一个很好的补充。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.