[英]splitting on a regular expression
我有一個要使用正則表達式拆分的字符串。
輸入:
S1:1- first split begins.s2:1- first split ends.s1:2-second split begins.S2:2-second split ends,S1:3-third split begins.S2:3-third split ends.
輸出:應該是一個包含我們拆分表達式的列表
[S1:1-第一個拆分開始,s2:1-第一個拆分結束,S1:2-第二個拆分開始,S2:2-第二個拆分結束,S1:3-第三個拆分開始.....]
我想分割模式[s/S][1/2]:[0-9][0-9]
這就是我所擁有的,但是它給了我額外的一行,並刪除了我分割的正則表達式。
import re
text="""S1:1- first Split begins.continue the sentence
s2:1- first split ends
s1:2-second split begins
S2:2-second split ends
S1:3-third split begins
S2:3-third split ends """
output=re.split("[Ss][12]:[0-9]*", text)
我不太確定我是否知道要在哪里拆分此輸入,但是如果要將拆分的文本包含在輸出中,則需要具有捕獲模式:
re.split("([Ss][1-2]:[0-9]-)\s*", text)
結果:
['',
'S1:1-',
'first Split begins.continue the sentence\n ',
's2:1-',
'first split ends\n ',
's1:2-',
'second split begins\n ',
'S2:2-',
'second split ends\n ',
'S1:3-',
'third split begins\n ',
'S2:3-',
'third split ends ']
嘗試使用正則表達式中的正向向前(?= CODE )
完成此操作,以保留正則表達式。 您的正則表達式將如下所示:
\s(?=[sS][12]:[0-9])
完整的代碼:
import re
text="""S1:1- first Split begins.continue the sentence
s2:1- first split ends
s1:2-second split begins
S2:2-second split ends
S1:3-third split begins
S2:3-third split ends """
output=re.split("\s(?=[sS][12]:[0-9])", text)
結果:
['S1:1- first Split begins.continue the sentence\n ', 's2:1- first split ends\n ', 's1:2-second split begins\n ', 'S2:2-second split ends\n ', 'S1:3-third split begins\n ', 'S2:3-third split ends ']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.