简体   繁体   English

Python / Regex:如果一行确实包含某些特殊字符,则拆分字符串

[英]Python/Regex: Split string if a line does contain a certain special character

I'm trying to split a multiline string on a character but only if the line does not contain : . 我正在尝试在字符上分割多行字符串,但前提是该行不包含: Unfortunately I can't see an easy way to use re.split() with negative lookback on the character : since it's possible that : occurred in another line earlier in the string. 不幸的是,我看不到使用re.split()对字符进行负向回溯的简单方法:因为可能:发生在字符串的另一行中。

As an example, I'd like to split the below string on ) . 举个例子,我想拆就下面的字符串)

String: 串:

Hello1 (
First : (),
Second )

Hello2 (
First 
)

Output: 输出:

['Hello1 (\nFirst : (),\nSecond', 'Hello2 (\nFirst \n']

It is possible with Python , albeit not "out of the box" with the native re module. 使用Python是可能的,尽管使用本地re模块不是“开箱即用”的。

First alternative 第一种选择

The newer regex module supports a variable-length lookbehind, so you could use 较新的regex模块支持后向可变长度查找,因此您可以使用

(?<=^[^:]+)\)
# pos. lookbehind making sure there's no : in that line


In Python : Python

 import regex as re data = """ Hello1 ( First : (), Second ) Hello2 ( First )""" pattern = re.compile(r'(?<=^[^:]+)\\)', re.MULTILINE) parts = pattern.split(data) print(parts) 

Which yields 哪个产量

 ['\\nHello1 (\\nFirst : (),\\nSecond ', '\\n\\nHello2 (\\nFirst \\n', ''] 


Second alternative 第二种选择

Alternatively, you could match the lines in question and let them fail with (*SKIP)(*FAIL) afterwards: 或者,您可以匹配有问题的行,然后使它们失败并显示(*SKIP)(*FAIL)

 ^[^:\\n]*:.*(*SKIP)(*FAIL)|\\) # match lines with at least one : in it # let them fail # or match ) 


Again in Python : 再次在Python

 pattern2 = re.compile(r'^[^:\\n]*:.*(*SKIP)(*FAIL)|\\)', re.MULTILINE) parts2 = pattern.split(data) print(parts2) 

See a demo for the latter on regex101.com . 在regex101.com上查看后者的演示


Third alternative 第三选择

Ok, now the answer is getting longer than previously thought. 好吧,现在的答案比以前想的要长。 You can even do it with the native re module with the help of a function. 您甚至可以在功能的帮助下使用本地re模块来执行此操作。 Here, you need to substitute the ) in question first and split by the substitute: 在这里,您需要先替换) ,然后用替换项拆分:

 def replacer(match): if match.group(1) is not None: return "SUPERMAN" else: return match.group(0) pattern3 = re.compile(r'^[^:\\n]*:.*|(\\))', re.MULTILINE) data = pattern3.sub(replacer, data) parts3 = data.split("SUPERMAN") print(parts3) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM