简体   繁体   中英

Python Regular expression must strip whitespace except between quotes

I need a way to remove all whitespace from a string, except when that whitespace is between quotes.

result = re.sub('".*?"', "", content)

This will match anything between quotes, but now it needs to ignore that match and add matches for whitespace..

I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.

import re

def stripwhite(text):
    lst = text.split('"')
    for i, item in enumerate(lst):
        if not i % 2:
            lst[i] = re.sub("\s+", "", item)
    return '"'.join(lst)

print stripwhite('This is a string with some "text in quotes."')

Here is a one-liner version, based on @kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:

stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
    for i,it in enumerate(txt.split('"'))  )

Usage example:

>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'

You can use shlex.split for a quotation-aware split, and join the result using " ".join. Eg

print " ".join(shlex.split('Hello "world     this    is" a    test'))

Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest .)

Here's the small regex:

"[^"]*"|(\s+)

The left side of the alternation matches complete "quoted strings" . We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.

Here is working code (and an online demo ):

import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
    if m.group(1):
        return ""
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...

Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')

start, end = '"', '"'

for test in ('Hello "world this is" atest',
             'This is a string with some " text inside in quotes."',
             'This is without quote.',
             'This is sentence with bad "quote'):
    result = ''

    while start in test :
        clean, _, test = test.partition(start)
        clean = clean.replace(' ','') + start
        inside, tag, test = test.partition(end)
        if not tag:
            raise SyntaxError, 'Missing end quote %s' % end
        else:
            clean += inside + tag # inside not removing of white space
        result += clean
    result += test.replace(' ','')
    print result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM