I have a use case where I want to replace multiple spaces with a single space unless they appear within quotes. For example
Original
this is the first a b c
this is the second "a b c"
After
this is the first a b c
this is the second "a b c"
I believe a regular expression should be able to do the trick but I don't have much experience with them. Here's some of the code I already have
import re
str = 'this is the second "a b c"'
# Replace all multiple spaces with single space
print re.sub('\s\s+', '\s', str)
# Doesn't work, but something like this
print re.sub('[\"]^.*\s\s+.*[\"]^, '\s', str)
I understand why my second one above doesn't work, so would just like some alternative approaches. If possible, could you explain the parts of your regex solution. Thanks
Assuming no "
within the "substring"
import re
str = 'a b c "d e f"'
str = re.sub(r'("[^"]*")|[ \t]+', lambda m: m.group(1) if m.group(1) else ' ', str)
print(str)
#'a b c "d e f"'
The regex ("[^"]*")|[ \\t]+
will match either a quoted substring or one or more single spaces or tabs. Because the regex matches the quoted substring first, the whitespace inside it will not be able to be matched by the alternative subpattern [ \\t]+
, and therefore will be ignored.
The pattern that matches the quoted substring is enclosed in ()
so the callback can check if it was matched. If it was, m.group(1)
will be truthy and it's value is simply returned. If not, it is whitespace that has been matched so a single space is returned as the replacement value.
Without the lamda
def repl(match):
quoted = match.group(1)
return quoted if quoted else ' '
str = re.sub(r'("[^"]*")|[ \t]+', repl, str)
If you want a solution that will work reliably every time, no matter the input or other caveats like not allowing embedded quotes, then you want to write a simple parser not use RegExp or splitting on quotes.
def parse(s):
last = ''
result = ''
toggle = 0
for c in s:
if c == '"' and last != '\\':
toggle ^= 1
if c == ' ' and toggle == 0 and last == ' ':
continue
result += c
last = c
return result
test = r'" < >"test 1 2 3 "a \"< >\" b c"'
print test
print parse(test)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.