簡體   English   中英

沒有空格的單詞之間未檢測到標點符號

[英]Punctuation not detected between words with no space

當檢測到標點(。?!)並且出現在兩個單詞之間且沒有空格時,如何拆分句子?

例:

>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not 
    working as expected.Because there isn't a space after dot.")  

輸出:

['This is an example.', 
"Not working as expected.Because there isn't a space after dot."] 

預期:

['This is an example.', 
'Not working as expected.', 
'Because there isn't a space after dot.']`
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")

+表示1個或多個,*表示0個或多個。

如果您需要保留。 您可能不想拆分,但可以這樣做:

splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")

這使

['This is an example.',
 ' Not working as expected.',
 "Because there isn't a space after dot."]

您可以通過使用正則表達式(例如'\\s*.*?[.?!]' )或僅使用.trim().trim()

使用https://regex101.com/r/icrJNl/3/

import re
from pprint import pprint

split_text = re.findall(".*?[?.!]", "This is an example! Working as "
                        "expected?Because.")

pprint(split_text)

注意: .*? 是一個懶惰(或非貪婪)量詞,與.*相反,它是一個貪婪量詞。

輸出:

['This is an example!', 
 ' Working as expected?', 
 'Because.']

另一個解決方案:

import re
from pprint import pprint

split_text = re.split("([?.!])", "This is an example! Working as "
    "expected?Because.")

pprint(split_text)

輸出:

['This is an example', 
'!', 
' Working as expected', 
'?', 
'Because', 
'.', 
'']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM