简体   繁体   中英

Split string after certain integer character pattern

I have a string stored in variable mystring . I wanted to split the string after a character 4-digit-integer character pattern ie (4-digit-integer) . I suppose this can be done using Python regex.

mystring = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'

Desired Output :

splitstring = ['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']

If you don't mind doing some filtering you could do:

import re

string = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
result = [m for m in re.split('([^\d(]+\(\d{4}\))\s+', string) if m]
print(result)

Output

['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']

When using split with a capturing group the result will include the group in this case ([^\\d(]+\\(\\d{4}\\)) ie anything that is not a number nor an open parenthesis followed exactly by four numbers surrounded by parenthesis. No the that the following spaces \\s+ are left out.

Here is a simple way how you could do it.

Since brackets have another meaning in REs (they are called capturing groups), you need to escape them like : \\( for opening bracket. Then, you can search for (2018) and then split the text accodringly:

import re
s = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
match = re.search(r'\(\d{4}\)', s)

split_string = [ s[:match.end()], s[match.end():] ]
print(split_string) 
# ['Lorem Ipsum (2018)', ' Amet (Lorem Dolor Amet Elit)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM