簡體   English   中英

Python:如何使用正則表達式將句子拆分為新行,然后使用空格將標點符號與單詞分開?

[英]Python: How can I use a regex to split sentences to new lines, and then separate punctuation from words using whitespace?

我有以下輸入:

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

首先,每個句子應移至新行。 然后,應將所有標點符號與單詞“ /”,“',”,“-”,“ +”和“ $”除外。

因此輸出應為:

"I love programming with Python-3 . 3 ! 
Do you ?  
It's great . . . 
I give it a 10/10 . 
It's free-to-use , no $$$ involved !"

我使用以下代碼:

>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "

但是問題在於它不會將句子分成新的行。 在標點和字符之間創建空格之前,如何使用正則表達式來做到這一點?

就像是

>>> import re
>>> from string import punctuation
>>> print re.sub(r'(?<=['+punctuation+'])\s+(?=[A-Z])', '\n', input)
I love programming with Python-3.3!
Do you?
It's great...
I give it a 10/10.
It's free-to-use, no $$$ involved!
([!?.])(?=\s*[A-Z])\s*

您可以使用此正則表達式在正則表達式之前創建句子。請參見demo.Replace by \\1\\n

https://regex101.com/r/sH8aR8/5

x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"([!?.])(?=\s*[A-Z])",r"\1\n",x)

編輯:

(?<![A-Z][a-z])([!?.])(?=\s*[A-Z])\s*

嘗試此操作。有關不同數據集,請參見演示。

https://regex101.com/r/sH8aR8/9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM