简体   繁体   中英

A pythonic way to insert a space before capital letters

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".

My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?

You could try:

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'

If there are consecutive capitals, then Gregs result could not be what you look for, since the \\w consumes the caracter in front of the captial letter to be replaced.

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

Edit: Maybe better to include it here.

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]

也许更短:

>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")

也许您会对不使用正则表达式的单行实现感兴趣:

''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()

With regexes you can do this:

re.sub('([A-Z])', r' \1', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):

re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))

The output will be: 'Dave Is AFK Right Now! Cool' 'Dave Is AFK Right Now! Cool'

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

How about:

text = 'WordWordWord'
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += ' '

    new_text += letter

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()

To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.

st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper()  else s for s in st]).lstrip())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM