[英]Replacing multiple regex patterns together
我有一個很長的字符串,我想要替換幾十個正則表達式,所以我創建了一個這樣的字典:
replacements = { r'\spunt(?!\s*komma)' : r".",
r'punt komma' : r",",
r'(?<!punt )komma' : r",",
"paragraaf" : "\n\n" }
上面的字典是一個小選擇。
我怎么能將它應用於字符串文檔? 示例字符串:
text = ""a punt komma is in this case not a komma and thats it punt"
我試過這樣的事情:
import re
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
if __name__ == "__main__":
text = "Larry Wall is the creator of Perl"
dict = {
"Larry Wall" : "Guido van Rossum",
"creator" : "Benevolent Dictator for Life",
"Perl" : "Python",
}
print(multiple_replace(dict, text))
但這只適用於字符串替換而不是像我的字典那樣的正則表達式模式。
迭代你的字典,然后使用每個鍵,值對進行替換:
replacements = { r'\spunt(?!\s*komma)' : r".",
r'punt komma' : r",",
r'(?<!punt )komma' : r",",
"paragraaf" : "\n\n" }
text = "a punt komma is in this case not a komma and thats it punt"
print(text)
for key, value in replacements.items():
text = re.sub(key, value, text)
print(text)
這輸出:
a punt komma is in this case not a komma and thats it punt
a , is in this case not a , and thats it.
請注意,您可能應該在每個鍵正則表達式術語周圍使用單詞邊界\\b
,以避免匹配無意的子字符串。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.