In python, how can I split a string with an regex by the following ruleset:
;
):
).So splitting
"foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"
should yield
["foo", "bar:;baz::", "one:two", "::three::::", "four", "", "five:::;six", ":seven", "::eight"]
My own attempt was:
re.split(r'(?<!:);', str)
Which cannot handle rule #3
If matching is also an option, and the empty match ''
is not required:
(?::[:;]|[^;\n])+
(?:
Non capture group
:[:;]
Match :
followed by either :
or ;
|
Or[^;\n]
Match 1+ times any char except ;
or a newline)+
Close non capture group and repeat 1+ times import re
regex = r"(?::[:;]|[^;\n])+"
str = "foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"
print(re.findall(regex, str))
Output
['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', 'five:::;six', ':seven', '::eight']
If you want the empty match, you could add 2 lookarounds to get the position where there is a ;
to the left and right
(?::[:;]|[^;\n]|(?<=;)(?=;))+
You could use regex
module with the following pattern to split on:
(?<!:)(?:::)*\K;
See an online demo
(?<::)
- Negative lookbehind. (?:::)*
- A non capturing group for 0+ times 2 literal colons. \K
- Reset starting point of reported match. ;
- A literal semi-colon. For example:
import regex as re
s = 'foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight'
lst = re.split(r'(?<!:)(?:::)*\K;', s)
print(lst) # ['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', '', 'five:::;six', ':seven', '::eight']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.