I'm trying to split a multiline string on a character but only if the line does not contain :
. Unfortunately I can't see an easy way to use re.split()
with negative lookback on the character :
since it's possible that :
occurred in another line earlier in the string.
As an example, I'd like to split the below string on )
.
String:
Hello1 (
First : (),
Second )
Hello2 (
First
)
Output:
['Hello1 (\nFirst : (),\nSecond', 'Hello2 (\nFirst \n']
It is possible with Python
, albeit not "out of the box" with the native re
module.
The newer regex
module supports a variable-length lookbehind, so you could use
(?<=^[^:]+)\)
# pos. lookbehind making sure there's no : in that line
Python
:
import regex as re data = """ Hello1 ( First : (), Second ) Hello2 ( First )""" pattern = re.compile(r'(?<=^[^:]+)\\)', re.MULTILINE) parts = pattern.split(data) print(parts)
Which yields
['\\nHello1 (\\nFirst : (),\\nSecond ', '\\n\\nHello2 (\\nFirst \\n', '']
Alternatively, you could match the lines in question and let them fail with (*SKIP)(*FAIL)
afterwards:
^[^:\\n]*:.*(*SKIP)(*FAIL)|\\) # match lines with at least one : in it # let them fail # or match )
Python
:
pattern2 = re.compile(r'^[^:\\n]*:.*(*SKIP)(*FAIL)|\\)', re.MULTILINE) parts2 = pattern.split(data) print(parts2)
See a demo for the latter on regex101.com .
Ok, now the answer is getting longer than previously thought. You can even do it with the native re
module with the help of a function. Here, you need to substitute the )
in question first and split by the substitute:
def replacer(match): if match.group(1) is not None: return "SUPERMAN" else: return match.group(0) pattern3 = re.compile(r'^[^:\\n]*:.*|(\\))', re.MULTILINE) data = pattern3.sub(replacer, data) parts3 = data.split("SUPERMAN") print(parts3)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.