I'd like to design a regex that can mmatch the characters [].\\,();~-
that are not enclosed within double quotes.
For example, this string:
do Output.printString("Test 1: expected result: 5; actual result: ");
should return matches:
['.', '(', ')', ';']
I tried using negative lookahead and negative lookbehind to no avail.
You can use this regex with a lookahead that makes sure to match a symbol outside pair of double quotes:
>>> s = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
>>> print re.findall(r'[][.,();~-](?=(?:(?:[^"]*"){2})*[^"]*$)', s)
['.', '(', ')', ';']
(?:[^"]*"){2}
finds a pair of quotes (?:(?:[^"]*"){2})*
finds 0 or more such pairs [^"]*$
makes sure that we don't have any more quotes after last matched quote You need two steps, as Python regular expressions are not powerful enough to do it in one go.
re.findall(r'[\[\].\\,();~-]', re.sub(r'"(?:\\.|[^"\\])*"', '', s))
# => ['.', '(', ')', ';']
The inner re.sub
deletes all double-quoted strings (ignoring escaped double quotes); then you can use re.findall
to easily pick up what you want.
We could do something like -
Remove text inside double quotes
import re
pattern = u"[\"].*?[\"]"
text = 'do Output.printString("Test 1: expected result: 5; actual result: ");'
new_text = re.sub(ptrn, '', text)
# O/P 'do Output.printString();'
Match all characters you need
pattern_2 = u"[\[\]\.\,\(\)\;\~\-]"
matches = re.findall(pattern2, new_text)
O/P ['.', '(', ')', ';']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.