简体   繁体   English

Python 带有拆分字符和转义字符的拆分字符串

[英]Python split string with split character and escape character

In python, how can I split a string with an regex by the following ruleset:在 python 中,如何通过以下规则集使用正则表达式拆分字符串:

  1. Split by a split char (eg ; )由拆分字符拆分(例如;
  2. Don't split if that split char is escaped by an escape char (eg : ).如果该拆分字符被转义字符(例如: )转义,则不要拆分。
  3. Do the split, if the escape char is escaped by itself如果转义字符自行转义,请进行拆分

So splitting所以分裂

"foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"

should yield应该产生

["foo", "bar:;baz::", "one:two", "::three::::", "four", "", "five:::;six", ":seven", "::eight"]

My own attempt was:我自己的尝试是:

re.split(r'(?<!:);', str)

Which cannot handle rule #3哪个不能处理规则#3

If matching is also an option, and the empty match '' is not required:如果匹配也是一个选项,并且空匹配''不是必需的:

(?::[:;]|[^;\n])+
  • (?: Non capture group (?:非捕获组
    • :[:;] Match : followed by either : or ; :[:;]匹配:后跟:;
    • | Or或者
    • [^;\n] Match 1+ times any char except ; [^;\n]匹配除;以外的任何字符 1 次以上or a newline或换行符
  • )+ Close non capture group and repeat 1+ times )+关闭非捕获组并重复 1+ 次

Regex demo正则表达式演示

import re

regex = r"(?::[:;]|[^;\n])+"
str = "foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight"
print(re.findall(regex, str))

Output Output

['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', 'five:::;six', ':seven', '::eight']

Python demo Python 演示

If you want the empty match, you could add 2 lookarounds to get the position where there is a ;如果您想要空匹配,您可以添加 2 个环视来获得 position 有一个; to the left and right向左和向右

(?::[:;]|[^;\n]|(?<=;)(?=;))+

Regex demo正则表达式演示

You could use regex module with the following pattern to split on:您可以使用具有以下模式的regex模块进行拆分:

(?<!:)(?:::)*\K;

See an online demo查看在线演示

  • (?<::) - Negative lookbehind. (?<::) - 消极的后视。
  • (?:::)* - A non capturing group for 0+ times 2 literal colons. (?:::)* - 0+ 乘以 2 个文字冒号的非捕获组。
  • \K - Reset starting point of reported match. \K - 重置报告匹配的起点。
  • ; - A literal semi-colon. - 文字分号。

For example:例如:

import regex as re
s = 'foo;bar:;baz::;one:two;::three::::;four;;five:::;six;:seven;::eight'
lst = re.split(r'(?<!:)(?:::)*\K;', s)
print(lst) # ['foo', 'bar:;baz::', 'one:two', '::three::::', 'four', '', 'five:::;six', ':seven', '::eight']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM