简体   繁体   中英

How to split a string by character sets that are different in python

I want to split an string I have by characters that are different than the others into a list. for example, if I have string ccaaawq , I want my program to give me ['cc', 'aaa', 'w', 'q'] . Since there is no single differentiator between each split, I'm wondering what is the best approach to solving this problem. thanks in advance for your answers

You can use itertools.groupby :

from itertools import groupby

s = "ccaaawq"

out = ["".join(g) for _, g in groupby(s)]
print(out)

Prints:

['cc', 'aaa', 'w', 'q']

Here is a regex find all approach:

inp = "ccaaawq"
output = [x[0] for x in re.findall(r'((.)\2*)', inp)]
print(output)  # ['cc', 'aaa', 'w', 'q']

The above works by matching any one character followed by that same character zero or more times. These matches are then stored in the first capture group, which we extract from the 2D list output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM