[英]Punctuation not detected between words with no space
當檢測到標點(。?!)並且出現在兩個單詞之間且沒有空格時,如何拆分句子?
例:
>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not
working as expected.Because there isn't a space after dot.")
輸出:
['This is an example.',
"Not working as expected.Because there isn't a space after dot."]
預期:
['This is an example.',
'Not working as expected.',
'Because there isn't a space after dot.']`
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")
+表示1個或多個,*表示0個或多個。
如果您需要保留。 您可能不想拆分,但可以這樣做:
splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")
這使
['This is an example.',
' Not working as expected.',
"Because there isn't a space after dot."]
您可以通過使用正則表達式(例如'\\s*.*?[.?!]'
)或僅使用.trim()
來.trim()
使用https://regex101.com/r/icrJNl/3/ 。
import re
from pprint import pprint
split_text = re.findall(".*?[?.!]", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
注意: .*?
是一個懶惰(或非貪婪)量詞,與.*
相反,它是一個貪婪量詞。
輸出:
['This is an example!',
' Working as expected?',
'Because.']
另一個解決方案:
import re
from pprint import pprint
split_text = re.split("([?.!])", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
輸出:
['This is an example',
'!',
' Working as expected',
'?',
'Because',
'.',
'']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.