I have a string stored in variable mystring
. I wanted to split the string after a character 4-digit-integer character
pattern ie (4-digit-integer) . I suppose this can be done using Python regex.
mystring = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
Desired Output :
splitstring = ['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']
If you don't mind doing some filtering you could do:
import re
string = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
result = [m for m in re.split('([^\d(]+\(\d{4}\))\s+', string) if m]
print(result)
Output
['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']
When using split with a capturing group the result will include the group in this case ([^\\d(]+\\(\\d{4}\\))
ie anything that is not a number nor an open parenthesis followed exactly by four numbers surrounded by parenthesis. No the that the following spaces \\s+
are left out.
Here is a simple way how you could do it.
Since brackets have another meaning in REs (they are called capturing groups), you need to escape them like : \\(
for opening bracket. Then, you can search for (2018)
and then split the text accodringly:
import re
s = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
match = re.search(r'\(\d{4}\)', s)
split_string = [ s[:match.end()], s[match.end():] ]
print(split_string)
# ['Lorem Ipsum (2018)', ' Amet (Lorem Dolor Amet Elit)']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.