簡體   English   中英

正則表達式為匹配的字符串添加字符

[英]Regex add character to matched string

我有一個很長的字符串,這是一個段落,但是在句號之后沒有空格。 例如:

para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."

我試圖使用re.sub來解決這個問題,但輸出不是我的預期。

這就是我做的:

re.sub("(?<=\.).", " \1", para)

我匹配每個句子的第一個字符,我想在它前面放一個空格。 我的匹配模式是(?<=\\.). ,(據說)檢查一段時間后出現的任何字符。 我從其他stackoverflow問題中了解到\\ 1匹配最后匹配的模式,因此我將替換模式寫為\\1 ,后面跟着先前匹配的字符串。

這是輸出:

"I saw this film about 20 years ago and remember it as being particularly nasty. \x01I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women. \x01t is in black and white but saves the colour for one shocking shot. \x01t the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. \x01void. \x01

re.sub將匹配的字符替換為\\x01 ,而不是匹配任何以句號re.sub的字符並在其前面添加空格。 為什么? 如何在匹配的字符串之前添加字符?

(?<=a)b是一個積極的外觀 它符合b后面a b a未被捕獲。 所以在你的表達式中,我不確定在這種情況下\\1的值是什么,但它不在(?<=...)

你當前的方法有另一個缺陷:它會在a之后添加一個空格. 即使一個人已經在那里。

之后添加缺失的空間. ,我建議采用不同的策略:替換. - 跟隨非空間 - 非點 . 和空格:

re.sub(r'\.(?=[^ .])', '. ', para)

您可能使用以下正則表達式(具有正面后視負面前瞻斷言)

(?<=\.)(?!\s)

蟒蛇

re.sub(r"(?<=\.)(?!\s)", " ", para)

看看演示

您的regex略微修改版本也將起作用:

print re.sub(r"([\.])([^\s])", r"\1 \2", para)

# I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses' home and rapes, tortures and kills various women. It is in black and white but saves the colour for one shocking shot. At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. Avoid.

我想這就是你想要做的。 您可以傳遞一個函數來進行替換。

import re

def my_replace(match):
    return " " + match.group()

my_string = "dhd.hd hd hs fjs.hello"
print(re.sub(r'(?<=\.).', my_replace, my_string))

打印:

dhd. hd hd hs fjs. hello

正如@ Seanny123指出的那樣,即使在這段時間之后已經有空格,這也會增加一個空間。

您可以使用的最簡單的正則表達式替換是:

re.sub(r'\.(?=\w)', '. ', para)

它只是匹配每個句點,並使用前瞻(?=\\w)確保接下來有一個單詞字符,並且在句點之后還沒有空格並替換它.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM