從python列表中刪除字符串中所有出現的單詞

Question

我正在嘗試使用已編譯的正則表達式從字符串中匹配並刪除列表中的所有單詞，但是我正努力避免單詞中出現單詞。

當前：

 REMOVE_LIST = ["a", "an", "as", "at", ...]

 remove = '|'.join(REMOVE_LIST)
 regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
 out = regex.sub("", text)

在： “快速的棕色狐狸跳過了一只螞蟻”

出： “快速的棕色狐狸跳過了t”

預期： “快速的棕色狐狸跳過了”

我嘗試更改字符串以將其編譯為以下內容，但無濟於事：

 regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

有什么建議還是我遺漏了一些顯而易見的東西？

Answer 1

這是一個不使用正則表達式的建議，您可能要考慮：

>>> sentence = 'word1 word2 word3 word1 word2 word4'
>>> remove_list = ['word1', 'word2']
>>> word_list = sentence.split()
>>> ' '.join([i for i in word_list if i not in remove_list])
'word3 word4'

Answer 2

一個問題是只有第一個\\b位於原始字符串中。 第二個被解釋為退格字符（ASCII 8），而不是單詞邊界。

要修復，更改

regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

至

regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)
                                 ^ THIS

從python列表中刪除字符串中所有出現的單詞

問題描述

2 個解決方案

解決方案1
18 2013-03-15 15:19:03

解決方案2
11 已采納 2013-03-15 15:11:33

從python列表中刪除字符串中所有出現的單詞

問題描述

2 個解決方案

解決方案1 18 2013-03-15 15:19:03

解決方案2 11 已采納 2013-03-15 15:11:33

解決方案1
18 2013-03-15 15:19:03

解決方案2
11 已采納 2013-03-15 15:11:33