简体   繁体   中英

Insert space if uppercase letter is preceded and followed by one lowercase letter - Python

Is there a way to insert aspace if it contains a uppercase letter (but not the first letter)?

For example, given "RegularExpression" I´d like to obtain "Regular Expression" .

I tried the following regex:

re.sub("[a-z]{1}[A-Z][a-z]{1}", " ","regularExpression") 

Unfortunately, this deletes the matching pattern:

regula pression

I would prefer a regex solution, yet would be thankful for any working solution. Thanks!

In [1]: s = 'RegularExpression'

In [2]: answer = []

In [3]: breaks = [i for i,char in enumerate(s) if char.isupper()]

In [4]: breaks = breaks[1:]

In [5]: answer.append(s[:breaks[0]])

In [6]: for start,end in zip(breaks, breaks[1:]):
   ...:     answer.append(s[start:end])
   ...:

In [7]: answer.append(s[breaks[-1]:])

In [8]: answer
Out[8]: ['Regular', 'Expression']

In [9]: print(' '.join(answer))
Regular Expression

You can do this with the following:

import re

s = "RegularExpression"
re.sub(r"([A-Z][a-z]+)([A-Z][a-z]+)", r"\1 \2", s)

which means "put a space between the first match group and the second match group", where the match groups are a cap followed by one or more non-caps.

Try using Lookbehind "(?<=[az])([AZ])"

Ex:

import re

s = "RegularExpression"
print(re.sub(r"(?<=[a-z])([A-Z])", r" \1", s))

Output:

Regular Expression

As I understand, when an uppercase letter is preceded by a lowercase letter you wish to insert a space between them. You can do that by using re.sub to replace (zero-width) matches of the following regular expression with a space.

r'(?<=[a-z])(?=[A-Z])'

Regex demo < ¯\\ (ツ)> Python code

Note that the SUBSTITUTION box at the regex demo link contains one space.

Python's regex engine performs the following operations.

(?<=[a-z])  : use a positive lookbehind to assert that the match is preceded
              by a lowercase letter
(?=[A-Z])   : use a positive lookahead to assert that the match is followed
              by an uppercase letter

For the string 'RegularExpression' the regex matches the location between the letters 'r' and 'E' (ie, a zero-width match).

IIUC, one way using re.findall :

re.findall("[A-Z][a-z]+", "RegularExpression")

Output:

['Regular', 'Expression']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM