A pythonic way to insert a space before capital letters

Question

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".

My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?

Answer 1

You could try:

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'

Answer 2

If there are consecutive capitals, then Gregs result could not be what you look for, since the \\w consumes the caracter in front of the captial letter to be replaced.

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'

Answer 3

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

Edit: Maybe better to include it here.

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]

Answer 4

也许更短：

>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")

Answer 5

也许您会对不使用正则表达式的单行实现感兴趣：

''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()

Answer 6

With regexes you can do this:

re.sub('([A-Z])', r' \1', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

Answer 7

If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):

re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))

The output will be: 'Dave Is AFK Right Now! Cool' 'Dave Is AFK Right Now! Cool'

Answer 8

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

How about:

text = 'WordWordWord'
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += ' '

    new_text += letter

Answer 9

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()

Answer 10

To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.

st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper()  else s for s in st]).lstrip())

A pythonic way to insert a space before capital letters

Question

10 answers

solution1
46 ACCPTED 2008-10-13 21:20:55

solution2
32 2008-10-13 21:37:39

solution3
11 2008-10-13 21:41:49

solution4
11 2008-10-13 22:17:14

solution5
6 2017-08-20 05:02:51

solution6
4 2008-10-13 21:25:17

solution7
2 2017-10-15 21:14:16

solution8
0 2008-10-14 05:51:10

solution9
0 2008-10-14 09:06:22

solution10
0 2021-04-15 18:59:10

A pythonic way to insert a space before capital letters

Question

10 answers

solution1 46 ACCPTED 2008-10-13 21:20:55

solution2 32 2008-10-13 21:37:39

solution3 11 2008-10-13 21:41:49

solution4 11 2008-10-13 22:17:14

solution5 6 2017-08-20 05:02:51

solution6 4 2008-10-13 21:25:17

solution7 2 2017-10-15 21:14:16

solution8 0 2008-10-14 05:51:10

solution9 0 2008-10-14 09:06:22

solution10 0 2021-04-15 18:59:10

solution1
46 ACCPTED 2008-10-13 21:20:55

solution2
32 2008-10-13 21:37:39

solution3
11 2008-10-13 21:41:49

solution4
11 2008-10-13 22:17:14

solution5
6 2017-08-20 05:02:51

solution6
4 2008-10-13 21:25:17

solution7
2 2017-10-15 21:14:16

solution8
0 2008-10-14 05:51:10

solution9
0 2008-10-14 09:06:22

solution10
0 2021-04-15 18:59:10