简体   繁体   中英

Detect symbols that are not enclosed within double quotes (regex)

I'd like to design a regex that can mmatch the characters [].\\,();~- that are not enclosed within double quotes.

For example, this string:

do Output.printString("Test 1: expected result: 5; actual result: ");

should return matches:

['.', '(', ')', ';']

I tried using negative lookahead and negative lookbehind to no avail.

You can use this regex with a lookahead that makes sure to match a symbol outside pair of double quotes:

>>> s = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
>>> print re.findall(r'[][.,();~-](?=(?:(?:[^"]*"){2})*[^"]*$)', s)
['.', '(', ')', ';']

RegEx Demo

  • This regex will split on given special characters if those are outside double quotes by using a lookahead to make sure there are even number of quotes after matched character.
  • (?:[^"]*"){2} finds a pair of quotes
  • (?:(?:[^"]*"){2})* finds 0 or more such pairs
  • [^"]*$ makes sure that we don't have any more quotes after last matched quote

You need two steps, as Python regular expressions are not powerful enough to do it in one go.

re.findall(r'[\[\].\\,();~-]', re.sub(r'"(?:\\.|[^"\\])*"', '', s))
# => ['.', '(', ')', ';']

The inner re.sub deletes all double-quoted strings (ignoring escaped double quotes); then you can use re.findall to easily pick up what you want.

We could do something like -

Remove text inside double quotes

import re
pattern = u"[\"].*?[\"]"
text = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
new_text = re.sub(ptrn, '', text)
# O/P 'do Output.printString();'

Match all characters you need

pattern_2 = u"[\[\]\.\,\(\)\;\~\-]"
matches = re.findall(pattern2, new_text)

O/P ['.', '(', ')', ';']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM