[英]How to split a string into multiple lines using regex before the pattern match
I have a file with the swift data in the below format that needs to be split into multiple lines using regular expression in python.我有一个包含以下格式的 swift 数据的文件,需要使用 python 中的正则表达式将其拆分为多行。 Original file:
原始文件:
ID Information
1 :20:Test1 :25:test2:28C:test3
Desired Output:所需的 Output:
ID Information
1 :20:Test1
1 :25:test2
1 :28C:test3
Using Notepad++ I am able to break the 'Information'column into multiple lines using使用记事本++,我可以使用将“信息”列分成多行
Find: ^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:
查找:
^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:
Replace: \n$0
替换:
\n$0
Need to replicate the same using python.需要使用 python 复制相同的内容。 So far i tried the below code but the result does not contain the pattern.
到目前为止,我尝试了以下代码,但结果不包含该模式。 It is splitting after the pattern match:
它在模式匹配后分裂:
import re
s = ':20:Test1 :25:test2:28C:test3'
l = re.compile('^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:').split(s)
Result: ['', 'Test1 ', 'test2 ', 'test3']
结果:
['', 'Test1 ', 'test2 ', 'test3']
The result should also contain the regex pattern while splitting the string.拆分字符串时,结果还应包含正则表达式模式。
How about this pattern:这个模式怎么样:
import re
s = ':20:Test1 :25:test2:28C:test3'
p = re.compile('(:[0-9A-z]{1,3}:)([0-9A-z]+)')
print(p.findall(s))
#[(':20:', 'Test1'), (':25:', 'test2'), (':28C:', 'test3')]
Given that you have multiple types of output, it may be easier to use a little logic with a regex:鉴于您有多种类型的 output,使用正则表达式的小逻辑可能更容易:
s='''\
ID Information
1 :20:Test1 :25:test2:28C:test3'''
import re
for line in s.splitlines():
if m:=re.search(r'^(\d+)([ \t]+)(:.*)',line):
data=re.findall(r'(:[^:]+:[^:]+(?=:|$))', m.group(3))
for e in data:
print(m.group(1)+m.group(2)+e.rstrip())
else:
print(line)
Prints:印刷:
ID Information
1 :20:Test1
1 :25:test2
1 :28C:test3
As written, that is Python 3.8+ only.如所写,仅 Python 3.8+。 If you want on an earlier Python 3.X:
如果您想要更早的 Python 3.X:
for line in s.splitlines():
m=re.search(r'^(\d+)([ \t]+)(:.*)',line)
if m:
...
You may use您可以使用
import re
text = """ID Information
1 :20:Test1 :25:test2:28C:test3"""
valid_line_rx = r'^(\d+\s*)(:\d{2}[A-Za-z]?:.*)'
print( re.sub(valid_line_rx, lambda m:
"\n".join(["{}{}".format(m.group(1),x) for x in re.split(r'(?!^)(?=:\d{2}[A-Za-z]?:)', m.group(2))]),
text,
flags=re.M)
)
See the Python demo , output:参见Python 演示,output:
ID Information
1 :20:Test1
1 :25:test2
1 :28C:test3
The ^(\d+\s*)(:\d{2}[A-Za-z]?:.*)
regex matches ^(\d+\s*)(:\d{2}[A-Za-z]?:.*)
正则表达式匹配
^
- start of a line (due to re.M
flag) ^
- 行首(由于re.M
标志)(\d+\s*)
- Group 1: one or more digits and then 0 or more whitespaces (\d+\s*)
- 第 1 组:一个或多个数字,然后是 0 个或多个空格(:\d{2}[A-Za-z]?:.*)
- Group 2: :
, two digits, an optional letter and aa :
and then any 0 or more chars other than line break chars as many as possible. (:\d{2}[A-Za-z]?:.*)
- 第 2 组: :
,两位数,一个可选字母和 aa :
以及尽可能多的除换行符以外的任何 0 个或多个字符. The (??^)(:=?\d{2}[A-Za-z]::)
regex matches a location that is not the start of string and is immediately followed with :
, 2 digits, an optional letter and a :
, and this pattern is used to split the Group 2 value of the above regex match. (??^)(:=?\d{2}[A-Za-z]::)
正则表达式匹配不是字符串开头的位置,并且紧随其后的是:
、2位数字、一个可选字母和a :
,并且此模式用于拆分上述正则表达式匹配的 Group 2 值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.