繁体   English   中英

没有空格的单词之间未检测到标点符号

[英]Punctuation not detected between words with no space

当检测到标点(。?!)并且出现在两个单词之间且没有空格时,如何拆分句子?

例:

>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not 
    working as expected.Because there isn't a space after dot.")  

输出:

['This is an example.', 
"Not working as expected.Because there isn't a space after dot."] 

预期:

['This is an example.', 
'Not working as expected.', 
'Because there isn't a space after dot.']`
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")

+表示1个或多个,*表示0个或多个。

如果您需要保留。 您可能不想拆分,但可以这样做:

splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")

这使

['This is an example.',
 ' Not working as expected.',
 "Because there isn't a space after dot."]

您可以通过使用正则表达式(例如'\\s*.*?[.?!]' )或仅使用.trim().trim()

使用https://regex101.com/r/icrJNl/3/

import re
from pprint import pprint

split_text = re.findall(".*?[?.!]", "This is an example! Working as "
                        "expected?Because.")

pprint(split_text)

注意: .*? 是一个懒惰(或非贪婪)量词,与.*相反,它是一个贪婪量词。

输出:

['This is an example!', 
 ' Working as expected?', 
 'Because.']

另一个解决方案:

import re
from pprint import pprint

split_text = re.split("([?.!])", "This is an example! Working as "
    "expected?Because.")

pprint(split_text)

输出:

['This is an example', 
'!', 
' Working as expected', 
'?', 
'Because', 
'.', 
'']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM