使用動態正則表達式匹配字符串中的整個單詞

Question

我想看看一個詞是否出現在使用正則表達式的句子中。 單詞由空格分隔，但可以在任一側使用標點符號。 如果單詞位於字符串的中間，則以下匹配有效（它阻止部分單詞匹配，允許在單詞的任一側使用標點符號）。

match_middle_words = " [^a-zA-Z\d ]{0,}" + word + "[^a-zA-Z\d ]{0,} "

然而，這不會匹配第一個或最后一個單詞，因為沒有尾隨/前導空格。 所以，對於這些情況，我也一直在使用：

match_starting_word = "^[^a-zA-Z\d]{0,}" + word + "[^a-zA-Z\d ]{0,} "
match_end_word = " [^a-zA-Z\d ]{0,}" + word + "[^a-zA-Z\d]{0,}$"

然后結合

 match_string = match_middle_words  + "|" + match_starting_word  +"|" + match_end_word

有沒有一種簡單的方法可以避免需要三個匹配項。 具體來說，是否有一種方法可以指定“以空格或文件開頭（即“^”）和類似的“空格或文件結尾（即“$”）？

Answer 1

為什么不使用單詞邊界？

match_string = r'\b' + word + r'\b'
match_string = r'\b{}\b'.format(word)
match_string = rf'\b{word}\b'          # Python 3.7+ required

如果您有一個單詞列表（例如，在一個words變量中）要作為一個完整的單詞匹配，請使用

match_string = r'\b(?:{})\b'.format('|'.join(words))
match_string = rf'\b(?:{"|".join(words)})\b'         # Python 3.7+ required

在這種情況下，您將確保僅當單詞被非單詞字符包圍時才被捕獲。 另請注意， \\b在字符串開始和結束處匹配。 因此，添加 3 個替代方案是沒有用的。

示例代碼：

import re
strn = "word hereword word, there word"
search = "word"
print re.findall(r"\b" + search + r"\b", strn)

我們找到了 3 個匹配項：

['word', 'word', 'word']

關於“詞”邊界的注意事項

當“單詞”實際上是任何字符的塊時，您應該在傳遞給正則表達式模式之前重新re.escape它們：

match_string = r'\b{}\b'.format(re.escape(word)) # a single escaped "word" string passed
match_string = r'\b(?:{})\b'.format("|".join(map(re.escape, words))) # words list is escaped
match_string = rf'\b(?:{"|".join(map(re.escape, words))})\b' # Same as above for Python 3.7+

如果要作為整個單詞匹配的單詞可能以特殊字符開頭/結尾， \\b 將不起作用，請使用明確的單詞邊界：

match_string = r'(?<!\w){}(?!\w)'.format(re.escape(word))
match_string = r'(?<!\w)(?:{})(?!\w)'.format("|".join(map(re.escape, words)))

如果單詞邊界是空白字符或字符串的開頭/結尾，請使用空白邊界, (?<!\\S)...(?!\\S) ：

match_string = r'(?<!\S){}(?!\S)'.format(word)
match_string = r'(?<!\S)(?:{})(?!\S)'.format("|".join(map(re.escape, words)))

使用動態正則表達式匹配字符串中的整個單詞

問題描述

1 個解決方案

解決方案1
14 已采納 2015-05-01 22:30:20

使用動態正則表達式匹配字符串中的整個單詞

問題描述

1 個解決方案

解決方案1 14 已采納 2015-05-01 22:30:20

解決方案1
14 已采納 2015-05-01 22:30:20