I have a string that I want to split using a regular expression.
Input:
S1:1- first split begins.s2:1- first split ends.s1:2-second split begins.S2:2-second split ends,S1:3-third split begins.S2:3-third split ends.
Output:should be a list containing the expression on which we split
[S1:1-first split begins,s2:1-first split ends,S1:2-second split begins,S2:2-second split ends,S1:3-third split begins.....]
I want to split on pattern [s/S][1/2]:[0-9][0-9]
This is what I have, but it gives me an extra line and removes the regex on which I split.
import re
text="""S1:1- first Split begins.continue the sentence
s2:1- first split ends
s1:2-second split begins
S2:2-second split ends
S1:3-third split begins
S2:3-third split ends """
output=re.split("[Ss][12]:[0-9]*", text)
I'm not quite sure I understand where you want to split this input, but if you want the text that you're splitting on to be included in the output then you need to have a capturing pattern:
re.split("([Ss][1-2]:[0-9]-)\s*", text)
result:
['',
'S1:1-',
'first Split begins.continue the sentence\n ',
's2:1-',
'first split ends\n ',
's1:2-',
'second split begins\n ',
'S2:2-',
'second split ends\n ',
'S1:3-',
'third split begins\n ',
'S2:3-',
'third split ends ']
Try to use Positive Lookahead (?= CODE )
in Regex to accomplish this in order to preserve your regex. Your regex will be something like this:
\s(?=[sS][12]:[0-9])
Complete Code:
import re
text="""S1:1- first Split begins.continue the sentence
s2:1- first split ends
s1:2-second split begins
S2:2-second split ends
S1:3-third split begins
S2:3-third split ends """
output=re.split("\s(?=[sS][12]:[0-9])", text)
Outcome:
['S1:1- first Split begins.continue the sentence\n ', 's2:1- first split ends\n ', 's1:2-second split begins\n ', 'S2:2-second split ends\n ', 'S1:3-third split begins\n ', 'S2:3-third split ends ']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.