简体   繁体   中英

Python split string with split character and escape character

In python, how can I split a string with an regex by the following ruleset:

  1. Split by a split char (eg ; )
  2. Don't split if that split char is escaped by an escape char (eg : ).
  3. Do the split, if the escape char is escaped by itself

So splitting

"foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"

should yield

["foo", "bar:;baz::", "one:two", "::three::::", "four", "", "five:::;six", ":seven", "::eight"]

My own attempt was:

re.split(r'(?<!:);', str)

Which cannot handle rule #3

If matching is also an option, and the empty match '' is not required:

(?::[:;]|[^;\n])+
  • (?: Non capture group
    • :[:;] Match : followed by either : or ;
    • | Or
    • [^;\n] Match 1+ times any char except ; or a newline
  • )+ Close non capture group and repeat 1+ times

Regex demo

import re

regex = r"(?::[:;]|[^;\n])+"
str = "foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"
print(re.findall(regex, str))

Output

['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', 'five:::;six', ':seven', '::eight']

Python demo

If you want the empty match, you could add 2 lookarounds to get the position where there is a ; to the left and right

(?::[:;]|[^;\n]|(?<=;)(?=;))+

Regex demo

You could use regex module with the following pattern to split on:

(?<!:)(?:::)*\K;

See an online demo

  • (?<::) - Negative lookbehind.
  • (?:::)* - A non capturing group for 0+ times 2 literal colons.
  • \K - Reset starting point of reported match.
  • ; - A literal semi-colon.

For example:

import regex as re
s = 'foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight'
lst = re.split(r'(?<!:)(?:::)*\K;', s)
print(lst) # ['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', '', 'five:::;six', ':seven', '::eight']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM