Python：使用re.sub多次替換多個子字符串

Question

我試圖糾正一個有一些非常典型的掃描錯誤的文本（我誤認為是我，反之亦然）。 基本上我想在re.sub使用替換字符串取決於檢測到“I”的次數，類似於：

re.sub("(\\w+)(I+)(\\w*)", "\\g<1>l+\\g<3>", "I am stiII here.")

實現這一目標的最佳方法是什么？

Answer 1

將函數作為替換字符串傳遞，如文檔中所述。 您的函數可以識別錯誤並基於此創建最佳替換。

def replacement(match):
    if "I" in match.group(2):
        return match.group(1) + "l" * len(match.group(2)) + match.group(3)
    # Add additional cases here and as ORs in your regex

re.sub(r"(\w+)(II+)(\w*)", replacement, "I am stiII here.")
>>> I am still here.

（注意我修改了你的正則表達式，所以重復的Is會出現在一個組中。）

Answer 2

你可以使用一個環視來替換I后面或之前的另一個I ：

print re.sub("(?<=I)I|I(?=I)", "l", "I am stiII here.")

Answer 3

在我看來，你可以這樣做：

def replace_L(match):
    return match.group(0).replace(match.group(1),'l'*len(match.group(1)))

string_I_want=re.sub(r'\w+(I+)\w*',replace_L,'I am stiII here.')

Answer 4

基於DNS提出的答案，我構建了一些更復雜的東西來捕獲所有情況（或至少大多數情況），盡量不添加太多錯誤：

def Irepl(matchobj):
    # Catch acronyms
    if matchobj.group(0).isupper():
        return matchobj.group(0)
    else:
        # Replace Group2 with 'l's
        return matchobj.group(1) + 'l'*len(matchobj.group(2)) + matchobj.group(3)


# Impossible to know if first letter is correct or not (possibly a name)
I_FOR_l_PATTERN = "([a-zA-HJ-Z]+?)(I+)(\w*)"
for line in lines:
    tmp_line = line.replace("l'", "I'").replace("'I", "'l").replace(" l ", " I ")
    tmp_line = re.sub("^l ", "I ", tmp_line)

    cor_line = re.sub(I_FOR_l_PATTERN, Irepl, tmp_line)

    # Loop to catch all errors in a word (iIIegaI for example)
    while cor_line != tmp_line:
        tmp_line = cor_line
        cor_line = re.sub(I_FOR_l_PATTERN, Irepl, tmp_line)

希望這有助於其他人！

Python：使用re.sub多次替換多個子字符串

問題描述

4 個解決方案

解決方案1
3 已采納 2012-03-28 12:05:50

解決方案2
1 2012-03-28 12:49:29

解決方案3
0 2012-03-28 12:14:45

解決方案4
0 2012-03-29 06:02:07

Python：使用re.sub多次替換多個子字符串

問題描述

4 個解決方案

解決方案1 3 已采納 2012-03-28 12:05:50

解決方案2 1 2012-03-28 12:49:29

解決方案3 0 2012-03-28 12:14:45

解決方案4 0 2012-03-29 06:02:07

解決方案1
3 已采納 2012-03-28 12:05:50

解決方案2
1 2012-03-28 12:49:29

解決方案3
0 2012-03-28 12:14:45

解決方案4
0 2012-03-29 06:02:07