如何在模式匹配之前使用正则表达式将字符串拆分为多行

Question

I have a file with the swift data in the below format that needs to be split into multiple lines using regular expression in python.我有一个包含以下格式的 swift 数据的文件，需要使用 python 中的正则表达式将其拆分为多行。 Original file:原始文件：

ID        Information

1         :20:Test1  :25:test2:28C:test3

Desired Output:所需的 Output：

ID  Information

1     :20:Test1  
1     :25:test2  
1     :28C:test3

Using Notepad++ I am able to break the 'Information'column into multiple lines using使用记事本++，我可以使用将“信息”列分成多行

Find: ^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:查找： ^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:

Replace: \n$0替换： \n$0

Need to replicate the same using python.需要使用 python 复制相同的内容。 So far i tried the below code but the result does not contain the pattern.到目前为止，我尝试了以下代码，但结果不包含该模式。 It is splitting after the pattern match:它在模式匹配后分裂：

import re

s = ':20:Test1  :25:test2:28C:test3'

l = re.compile('^:[0-9]{2}:|\s:[0-9]{2}:|\s:[0-9]{2}[A-Za-z]{1}:').split(s)

Result: ['', 'Test1 ', 'test2 ', 'test3']结果： ['', 'Test1 ', 'test2 ', 'test3']

The result should also contain the regex pattern while splitting the string.拆分字符串时，结果还应包含正则表达式模式。

Answer 1

How about this pattern:这个模式怎么样：

import re

s = ':20:Test1  :25:test2:28C:test3'

p = re.compile('(:[0-9A-z]{1,3}:)([0-9A-z]+)')

print(p.findall(s))
#[(':20:', 'Test1'), (':25:', 'test2'), (':28C:', 'test3')]

Answer 2

Given that you have multiple types of output, it may be easier to use a little logic with a regex:鉴于您有多种类型的 output，使用正则表达式的小逻辑可能更容易：

s='''\
ID        Information

1         :20:Test1  :25:test2:28C:test3'''
    

import re 

for line in s.splitlines():
    if m:=re.search(r'^(\d+)([ \t]+)(:.*)',line):
        data=re.findall(r'(:[^:]+:[^:]+(?=:|$))', m.group(3))
        for e in data:
            print(m.group(1)+m.group(2)+e.rstrip())
    else:
        print(line)

Prints:印刷：

ID        Information

1         :20:Test1
1         :25:test2
1         :28C:test3

As written, that is Python 3.8+ only.如所写，仅 Python 3.8+。 If you want on an earlier Python 3.X:如果您想要更早的 Python 3.X：

for line in s.splitlines():
    m=re.search(r'^(\d+)([ \t]+)(:.*)',line)
    if m:
      ...

Answer 3

You may use您可以使用

import re
text = """ID        Information

1         :20:Test1  :25:test2:28C:test3"""

valid_line_rx = r'^(\d+\s*)(:\d{2}[A-Za-z]?:.*)'
print( re.sub(valid_line_rx, lambda m:
  "\n".join(["{}{}".format(m.group(1),x) for x in re.split(r'(?!^)(?=:\d{2}[A-Za-z]?:)', m.group(2))]),
  text, 
  flags=re.M)
)

See the Python demo , output:参见Python 演示，output：

ID        Information

1         :20:Test1  
1         :25:test2
1         :28C:test3

The ^(\d+\s*)(:\d{2}[A-Za-z]?:.*) regex matches ^(\d+\s*)(:\d{2}[A-Za-z]?:.*)正则表达式匹配

^ - start of a line (due to re.M flag) ^ - 行首（由于re.M标志）
(\d+\s*) - Group 1: one or more digits and then 0 or more whitespaces (\d+\s*) - 第 1 组：一个或多个数字，然后是 0 个或多个空格
(:\d{2}[A-Za-z]?:.*) - Group 2: : , two digits, an optional letter and aa : and then any 0 or more chars other than line break chars as many as possible. (:\d{2}[A-Za-z]?:.*) - 第 2 组: : ，两位数，一个可选字母和 aa :以及尽可能多的除换行符以外的任何 0 个或多个字符.

The (??^)(:=?\d{2}[A-Za-z]::) regex matches a location that is not the start of string and is immediately followed with : , 2 digits, an optional letter and a : , and this pattern is used to split the Group 2 value of the above regex match. (??^)(:=?\d{2}[A-Za-z]::)正则表达式匹配不是字符串开头的位置，并且紧随其后的是: 、2位数字、一个可选字母和a : ，并且此模式用于拆分上述正则表达式匹配的 Group 2 值。

如何在模式匹配之前使用正则表达式将字符串拆分为多行

问题描述

3 个解决方案

解决方案1
0 2020-07-27 16:29:01

解决方案2
0 2020-07-27 16:46:57

解决方案3
0 已采纳 2020-07-27 18:19:35

如何在模式匹配之前使用正则表达式将字符串拆分为多行

问题描述

3 个解决方案

解决方案1 0 2020-07-27 16:29:01

解决方案2 0 2020-07-27 16:46:57

解决方案3 0 已采纳 2020-07-27 18:19:35

解决方案1
0 2020-07-27 16:29:01

解决方案2
0 2020-07-27 16:46:57

解决方案3
0 已采纳 2020-07-27 18:19:35