[英]Punctuation not detected between words with no space
当检测到标点(。?!)并且出现在两个单词之间且没有空格时,如何拆分句子?
例:
>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not
working as expected.Because there isn't a space after dot.")
输出:
['This is an example.',
"Not working as expected.Because there isn't a space after dot."]
预期:
['This is an example.',
'Not working as expected.',
'Because there isn't a space after dot.']`
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")
+表示1个或多个,*表示0个或多个。
如果您需要保留。 您可能不想拆分,但可以这样做:
splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")
这使
['This is an example.',
' Not working as expected.',
"Because there isn't a space after dot."]
您可以通过使用正则表达式(例如'\\s*.*?[.?!]'
)或仅使用.trim()
来.trim()
使用https://regex101.com/r/icrJNl/3/ 。
import re
from pprint import pprint
split_text = re.findall(".*?[?.!]", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
注意: .*?
是一个懒惰(或非贪婪)量词,与.*
相反,它是一个贪婪量词。
输出:
['This is an example!',
' Working as expected?',
'Because.']
另一个解决方案:
import re
from pprint import pprint
split_text = re.split("([?.!])", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
输出:
['This is an example',
'!',
' Working as expected',
'?',
'Because',
'.',
'']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.