Python / Regex：如果一行确实包含某些特殊字符，则拆分字符串

Question

I'm trying to split a multiline string on a character but only if the line does not contain : . 我正在尝试在字符上分割多行字符串，但前提是该行不包含: 。 Unfortunately I can't see an easy way to use re.split() with negative lookback on the character : since it's possible that : occurred in another line earlier in the string. 不幸的是，我看不到使用re.split()对字符进行负向回溯的简单方法:因为可能:发生在字符串的另一行中。

As an example, I'd like to split the below string on ) . 举个例子，我想拆就下面的字符串) 。

String: 串：

Hello1 (
First : (),
Second )

Hello2 (
First 
)

Output: 输出：

['Hello1 (\nFirst : (),\nSecond', 'Hello2 (\nFirst \n']

Answer 1

It is possible with Python , albeit not "out of the box" with the native re module. 使用Python是可能的，尽管使用本地re模块不是“开箱即用”的。

First alternative 第一种选择

The newer regex module supports a variable-length lookbehind, so you could use 较新的regex模块支持后向可变长度查找，因此您可以使用

(?<=^[^:]+)\)
# pos. lookbehind making sure there's no : in that line

In Python : 在Python ：

 import regex as re data = """ Hello1 ( First : (), Second ) Hello2 ( First )""" pattern = re.compile(r'(?<=^[^:]+)\\)', re.MULTILINE) parts = pattern.split(data) print(parts)

Which yields 哪个产量

 ['\\nHello1 (\\nFirst : (),\\nSecond ', '\\n\\nHello2 (\\nFirst \\n', '']

Second alternative 第二种选择

Alternatively, you could match the lines in question and let them fail with (*SKIP)(*FAIL) afterwards: 或者，您可以匹配有问题的行，然后使它们失败并显示(*SKIP)(*FAIL) ：

 ^[^:\\n]*:.*(*SKIP)(*FAIL)|\\) # match lines with at least one : in it # let them fail # or match )

Again in Python : 再次在Python ：

 pattern2 = re.compile(r'^[^:\\n]*:.*(*SKIP)(*FAIL)|\\)', re.MULTILINE) parts2 = pattern.split(data) print(parts2)

See a demo for the latter on regex101.com . 在regex101.com上查看后者的演示 。

Third alternative 第三选择

Ok, now the answer is getting longer than previously thought. 好吧，现在的答案比以前想的要长。 You can even do it with the native re module with the help of a function. 您甚至可以在功能的帮助下使用本地re模块来执行此操作。 Here, you need to substitute the ) in question first and split by the substitute: 在这里，您需要先替换) ，然后用替换项拆分：

 def replacer(match): if match.group(1) is not None: return "SUPERMAN" else: return match.group(0) pattern3 = re.compile(r'^[^:\\n]*:.*|(\\))', re.MULTILINE) data = pattern3.sub(replacer, data) parts3 = data.split("SUPERMAN") print(parts3)

Python / Regex：如果一行确实包含某些特殊字符，则拆分字符串

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-02-23 05:03:53

First alternative 第一种选择

Second alternative 第二种选择

Third alternative 第三选择

Python / Regex：如果一行确实包含某些特殊字符，则拆分字符串

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-02-23 05:03:53

First alternative 第一种选择

Second alternative 第二种选择

Third alternative 第三选择

解决方案1
4 已采纳 2018-02-23 05:03:53