遍歷字符串列表，從每個字符串項中刪除所有禁用詞

Question

我有以下列表：

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

這是我要從列表中的每個字符串項中刪除的單詞列表：

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

我試圖生成的結果列表是這樣的：

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

到目前為止，我一直無法做到這一點。 我的嘗試如下：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

這將返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

我也試過：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

這將返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

在這一點上我有點迷茫，不知道我哪里出錯了。 任何幫助深表感謝。

Answer 1

一些問題：

正則表達式中的最后一個\W要求在禁用詞之后有一個字符。 因此，如果禁用詞是輸入字符串中的最后一個詞，那將失敗。 您可以再次使用\b ，就像您在正則表達式開始時所做的那樣
由於您也想替換逗號，因此您需要將其添加為選項。 確保不要將它放在同一個捕獲組中，因為最后的\\b將要求逗號后跟一個字母數字字符。 所以它應該作為一個選項放在你的正則表達式的最后（或開始）。
您可能希望在結果字符串上調用.strip()以刪除在刪除禁用詞后剩余的任何空白。

所以：

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()

Answer 2

def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList 是您將清除的列表

禁止的詞是你不想要的詞

splitChar 是單詞之間的字符（“”）

Answer 3

我會從bannedWord列表中刪除,並使用str.strip將其剝離：

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

印刷：

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

Answer 4

下面似乎工作（一個天真的嵌套循環）

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output

['lemons', 'cheddar cheese', 'carrots']

遍歷字符串列表，從每個字符串項中刪除所有禁用詞

問題描述

4 個解決方案

解決方案1
2 已采納 2022-08-13 16:01:51

解決方案2
0 2022-08-13 15:58:43

解決方案3
0 2022-08-13 16:00:01

解決方案4
0 2022-08-13 16:06:14

遍歷字符串列表，從每個字符串項中刪除所有禁用詞

問題描述

4 個解決方案

解決方案1 2 已采納 2022-08-13 16:01:51

解決方案2 0 2022-08-13 15:58:43

解決方案3 0 2022-08-13 16:00:01

解決方案4 0 2022-08-13 16:06:14

解決方案1
2 已采納 2022-08-13 16:01:51

解決方案2
0 2022-08-13 15:58:43

解決方案3
0 2022-08-13 16:00:01

解決方案4
0 2022-08-13 16:06:14