[英]Python/Regex: Split string if a line does contain a certain special character
I'm trying to split a multiline string on a character but only if the line does not contain :
. 我正在尝试在字符上分割多行字符串,但前提是该行不包含
:
。 Unfortunately I can't see an easy way to use re.split()
with negative lookback on the character :
since it's possible that :
occurred in another line earlier in the string. 不幸的是,我看不到使用
re.split()
对字符进行负向回溯的简单方法:
因为可能:
发生在字符串的另一行中。
As an example, I'd like to split the below string on )
. 举个例子,我想拆就下面的字符串
)
。
String: 串:
Hello1 (
First : (),
Second )
Hello2 (
First
)
Output: 输出:
['Hello1 (\nFirst : (),\nSecond', 'Hello2 (\nFirst \n']
It is possible with Python
, albeit not "out of the box" with the native re
module. 使用
Python
是可能的,尽管使用本地re
模块不是“开箱即用”的。
The newer regex
module supports a variable-length lookbehind, so you could use 较新的
regex
模块支持后向可变长度查找,因此您可以使用
(?<=^[^:]+)\)
# pos. lookbehind making sure there's no : in that line
Python
:
Python
:
import regex as re data = """ Hello1 ( First : (), Second ) Hello2 ( First )""" pattern = re.compile(r'(?<=^[^:]+)\\)', re.MULTILINE) parts = pattern.split(data) print(parts)
Which yields 哪个产量
['\\nHello1 (\\nFirst : (),\\nSecond ', '\\n\\nHello2 (\\nFirst \\n', '']
Alternatively, you could match the lines in question and let them fail with (*SKIP)(*FAIL)
afterwards: 或者,您可以匹配有问题的行,然后使它们失败并显示
(*SKIP)(*FAIL)
:
^[^:\\n]*:.*(*SKIP)(*FAIL)|\\) # match lines with at least one : in it # let them fail # or match )
Python
:
Python
:
pattern2 = re.compile(r'^[^:\\n]*:.*(*SKIP)(*FAIL)|\\)', re.MULTILINE) parts2 = pattern.split(data) print(parts2)
See a demo for the latter on regex101.com . 在regex101.com上查看后者的演示 。
Ok, now the answer is getting longer than previously thought. 好吧,现在的答案比以前想的要长。 You can even do it with the native
re
module with the help of a function. 您甚至可以在功能的帮助下使用本地
re
模块来执行此操作。 Here, you need to substitute the )
in question first and split by the substitute: 在这里,您需要先替换
)
,然后用替换项拆分:
def replacer(match): if match.group(1) is not None: return "SUPERMAN" else: return match.group(0) pattern3 = re.compile(r'^[^:\\n]*:.*|(\\))', re.MULTILINE) data = pattern3.sub(replacer, data) parts3 = data.split("SUPERMAN") print(parts3)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.