簡體   English   中英

一起替換多個正則表達式模式

[英]Replacing multiple regex patterns together

我有一個很長的字符串,我想要替換幾十個正則表達式,所以我創建了一個這樣的字典:

replacements = { r'\spunt(?!\s*komma)' : r".",
                 r'punt komma' : r",",
                 r'(?<!punt )komma' : r",",
                 "paragraaf" : "\n\n" }

上面的字典是一個小選擇。

我怎么能將它應用於字符串文檔? 示例字符串:

text = ""a punt komma is in this case not a komma and thats it punt"

我試過這樣的事情:

import re 

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

if __name__ == "__main__": 

  text = "Larry Wall is the creator of Perl"

  dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
  } 

  print(multiple_replace(dict, text))

但這只適用於字符串替換而不是像我的字典那樣的正則表達式模式。

迭代你的字典,然后使用每個鍵,值對進行替換:

replacements = { r'\spunt(?!\s*komma)' : r".",
                 r'punt komma' : r",",
                 r'(?<!punt )komma' : r",",
                 "paragraaf" : "\n\n" }

text = "a punt komma is in this case not a komma and thats it punt"
print(text)

for key, value in replacements.items():
    text = re.sub(key, value, text)

print(text)

這輸出:

a punt komma is in this case not a komma and thats it punt
a , is in this case not a , and thats it.

請注意,您可能應該在每個鍵正則表達式術語周圍使用單詞邊界\\b ,以避免匹配無意的子字符串。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM