简体   繁体   中英

Python String Split on pattern without removing delimiter

I have a long string, and I want to break it into smaller stinger whenever a certain pattern showed up: (in below case 123 my)

my_str = '123 my string is long 123 my string is very long 123 my string is so long'

I want the result to be:

result = ['123 my string is long ', '123 my string is very long ', '123 my string is so long ']

Length of string is unknown. and I don't want to remove anything from the main string.

You can also use a look ahead regex:

import re
re.split(r'.(?=123 my)', my_str)
=>
['123 my string is long',
 '123 my string is very long',
 '123 my string is so long']

You can split on the delimiter and then add it back in with a list comprehension:

my_str = '123 my string is long 123 my string is very long 123 my string is so long'
delimiter = '123 my'
result = ['{}{}'.format(delimiter, s) for s in my_str.split(delimiter) if s]
print(result)

Output

['123 my string is long ', '123 my string is very long ', '123 my string is so long']

I don't know where the trailing space in the last list item comes from in your desired output, it's not in the original string and so should be absent in the result.

Note that this only works if the delimiter begins at the start of the string

So...A little hacky but you can do this in two steps

 1. Find and replace all matches with (the match plus some custom character sequence or "\n").

 2. Split the new string by the custom sequence.

I did mine like this:

delimiter = "\n"   # or some custom pattern that won't occur in the string 

def break_line(match):
   return delimiter + match.group()


lines = re.sub(regex_pattern, break_line, text_you_want_to_split)
lines = re.split(delimiter, lines)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM