在正则表达式上拆分

Question

I have a string that I want to split using a regular expression. 我有一个要使用正则表达式拆分的字符串。

Input: 输入：

S1:1- first split begins.s2:1- first split ends.s1:2-second split begins.S2:2-second split ends,S1:3-third split begins.S2:3-third split ends.

Output:should be a list containing the expression on which we split 输出：应该是一个包含我们拆分表达式的列表

[S1:1-first split begins,s2:1-first split ends,S1:2-second split begins,S2:2-second split ends,S1:3-third split begins.....] [S1：1-第一个拆分开始，s2：1-第一个拆分结束，S1：2-第二个拆分开始，S2：2-第二个拆分结束，S1：3-第三个拆分开始.....]

I want to split on pattern [s/S][1/2]:[0-9][0-9] 我想分割模式[s/S][1/2]:[0-9][0-9]
This is what I have, but it gives me an extra line and removes the regex on which I split. 这就是我所拥有的，但是它给了我额外的一行，并删除了我分割的正则表达式。

import re
text="""S1:1- first Split begins.continue the sentence
       s2:1- first split ends
       s1:2-second split begins
       S2:2-second split ends
       S1:3-third split begins
       S2:3-third split ends """
output=re.split("[Ss][12]:[0-9]*", text)

Answer 1

I'm not quite sure I understand where you want to split this input, but if you want the text that you're splitting on to be included in the output then you need to have a capturing pattern: 我不太确定我是否知道要在哪里拆分此输入，但是如果要将拆分的文本包含在输出中，则需要具有捕获模式：

 re.split("([Ss][1-2]:[0-9]-)\s*", text)

result: 结果：

['',
 'S1:1-',
 'first Split begins.continue the sentence\n       ',
 's2:1-',
 'first split ends\n       ',
 's1:2-',
 'second split begins\n       ',
 'S2:2-',
 'second split ends\n       ',
 'S1:3-',
 'third split begins\n       ',
 'S2:3-',
 'third split ends ']

Answer 2

Try to use Positive Lookahead (?= CODE ) in Regex to accomplish this in order to preserve your regex. 尝试使用正则表达式中的正向向前(?= CODE )完成此操作，以保留正则表达式。 Your regex will be something like this: 您的正则表达式将如下所示：

\s(?=[sS][12]:[0-9])

Complete Code: 完整的代码：

import re
text="""S1:1- first Split begins.continue the sentence
       s2:1- first split ends
       s1:2-second split begins
       S2:2-second split ends
       S1:3-third split begins
       S2:3-third split ends """
output=re.split("\s(?=[sS][12]:[0-9])", text)

Outcome: 结果：

['S1:1- first Split begins.continue the sentence\n ', 's2:1- first split ends\n ', 's1:2-second split begins\n ', 'S2:2-second split ends\n ', 'S1:3-third split begins\n ', 'S2:3-third split ends ']

在正则表达式上拆分

问题描述

2 个解决方案

解决方案1
2 2017-02-13 03:22:16

解决方案2
2 已采纳 2017-02-13 03:37:35

在正则表达式上拆分

问题描述

2 个解决方案

解决方案1 2 2017-02-13 03:22:16

解决方案2 2 已采纳 2017-02-13 03:37:35

解决方案1
2 2017-02-13 03:22:16

解决方案2
2 已采纳 2017-02-13 03:37:35