[英]Detect symbols that are not enclosed within double quotes (regex)
I'd like to design a regex that can mmatch the characters [].\\,();~-
that are not enclosed within double quotes. 我想设计一个正则表达式,可以匹配不包含在双引号中的字符
[].\\,();~-
〜-。
For example, this string: 例如,以下字符串:
do Output.printString("Test 1: expected result: 5; actual result: ");
should return matches: 应该返回匹配项:
['.', '(', ')', ';']
I tried using negative lookahead and negative lookbehind to no avail. 我尝试使用否定的前瞻性和否定性的后瞻无济于事。
You can use this regex with a lookahead that makes sure to match a symbol outside pair of double quotes: 您可以将此正则表达式与超前使用,以确保与双引号对之外的符号匹配:
>>> s = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
>>> print re.findall(r'[][.,();~-](?=(?:(?:[^"]*"){2})*[^"]*$)', s)
['.', '(', ')', ';']
(?:[^"]*"){2}
finds a pair of quotes (?:[^"]*"){2}
找到一对引号 (?:(?:[^"]*"){2})*
finds 0 or more such pairs (?:(?:[^"]*"){2})*
发现0个或更多这样的对 [^"]*$
makes sure that we don't have any more quotes after last matched quote [^"]*$
确保最后匹配的报价后没有其他报价 You need two steps, as Python regular expressions are not powerful enough to do it in one go. 您需要两步,因为Python正则表达式的功能不足以一次性完成。
re.findall(r'[\[\].\\,();~-]', re.sub(r'"(?:\\.|[^"\\])*"', '', s))
# => ['.', '(', ')', ';']
The inner re.sub
deletes all double-quoted strings (ignoring escaped double quotes); 内部
re.sub
删除所有用双引号引起来的字符串(忽略转义的双引号); then you can use re.findall
to easily pick up what you want. 那么您可以使用
re.findall
轻松获取您想要的东西。
We could do something like - 我们可以做类似的事情-
Remove text inside double quotes 删除双引号内的文本
import re
pattern = u"[\"].*?[\"]"
text = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
new_text = re.sub(ptrn, '', text)
# O/P 'do Output.printString();'
Match all characters you need 匹配您需要的所有字符
pattern_2 = u"[\[\]\.\,\(\)\;\~\-]"
matches = re.findall(pattern2, new_text)
O/P ['.', '(', ')', ';']
O / P
['.', '(', ')', ';']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.