简体   繁体   English

检测未包含在双引号中的符号(正则表达式)

[英]Detect symbols that are not enclosed within double quotes (regex)

I'd like to design a regex that can mmatch the characters [].\\,();~- that are not enclosed within double quotes. 我想设计一个正则表达式,可以匹配包含在双引号中的字符[].\\,();~- 〜-。

For example, this string: 例如,以下字符串:

do Output.printString("Test 1: expected result: 5; actual result: ");

should return matches: 应该返回匹配项:

['.', '(', ')', ';']

I tried using negative lookahead and negative lookbehind to no avail. 我尝试使用否定的前瞻性和否定性的后瞻无济于事。

You can use this regex with a lookahead that makes sure to match a symbol outside pair of double quotes: 您可以将此正则表达式与超前使用,以确保与双引号对之外的符号匹配:

>>> s = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
>>> print re.findall(r'[][.,();~-](?=(?:(?:[^"]*"){2})*[^"]*$)', s)
['.', '(', ')', ';']

RegEx Demo 正则演示

  • This regex will split on given special characters if those are outside double quotes by using a lookahead to make sure there are even number of quotes after matched character. 如果给定的特殊字符在双引号之外,则该正则表达式将通过使用超前查找来确保匹配字符后的双引号数量进行分割。
  • (?:[^"]*"){2} finds a pair of quotes (?:[^"]*"){2}找到一对引号
  • (?:(?:[^"]*"){2})* finds 0 or more such pairs (?:(?:[^"]*"){2})*发现0个或更多这样的对
  • [^"]*$ makes sure that we don't have any more quotes after last matched quote [^"]*$确保最后匹配的报价后没有其他报价

You need two steps, as Python regular expressions are not powerful enough to do it in one go. 您需要两步,因为Python正则表达式的功能不足以一次性完成。

re.findall(r'[\[\].\\,();~-]', re.sub(r'"(?:\\.|[^"\\])*"', '', s))
# => ['.', '(', ')', ';']

The inner re.sub deletes all double-quoted strings (ignoring escaped double quotes); 内部re.sub删除所有用双引号引起来的字符串(忽略转义的双引号); then you can use re.findall to easily pick up what you want. 那么您可以使用re.findall轻松获取您想要的东西。

We could do something like - 我们可以做类似的事情-

Remove text inside double quotes 删除双引号内的文本

import re
pattern = u"[\"].*?[\"]"
text = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
new_text = re.sub(ptrn, '', text)
# O/P 'do Output.printString();'

Match all characters you need 匹配您需要的所有字符

pattern_2 = u"[\[\]\.\,\(\)\;\~\-]"
matches = re.findall(pattern2, new_text)

O/P ['.', '(', ')', ';'] O / P ['.', '(', ')', ';']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM