遍历字符串列表，从每个字符串项中删除所有禁用词

Question

I have the following list:我有以下列表：

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

This is a list of words that I want to remove from each of the string items in the list:这是我要从列表中的每个字符串项中删除的单词列表：

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

The resulting list that I am trying to generate is this:我试图生成的结果列表是这样的：

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

So far, I have been unable to achieve this.到目前为止，我一直无法做到这一点。 My attempt is as follows:我的尝试如下：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

This returns:这将返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I have also tried:我也试过：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

This returns:这将返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I'm a bit lost at this point and not sure where I'm going wrong.在这一点上我有点迷茫，不知道我哪里出错了。 Any help is much appreciated.任何帮助深表感谢。

Answer 1

Some issues:一些问题：

The final \W in your regex requires that there is a character that follows the banned word.正则表达式中的最后一个\W要求在禁用词之后有一个字符。 So if the banned word is the last word in the input string, that will fail.因此，如果禁用词是输入字符串中的最后一个词，那将失败。 You could just use \b again, like you did at the start of the regex您可以再次使用\b ，就像您在正则表达式开始时所做的那样
Since you wanted to replace the comma as well, you need to add it as an option.由于您也想替换逗号，因此您需要将其添加为选项。 Make sure to not put it inside that same capture group, as then \\b at the end would require that comma to be followed by an alphanumerical character.确保不要将它放在同一个捕获组中，因为最后的\\b将要求逗号后跟一个字母数字字符。 So it should be put as an option right at the very end (or start) of your regex.所以它应该作为一个选项放在你的正则表达式的最后（或开始）。
You might want to call .strip() on the resulting string to remove any white space that remains after the banned words have been removed.您可能希望在结果字符串上调用.strip()以删除在删除禁用词后剩余的任何空白。

So:所以：

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()

Answer 2

def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList is list that you will clear dirtyList 是您将清除的列表

bannedWords are words that you dont want禁止的词是你不想要的词

splitChar is charcther that is between the words (" ") splitChar 是单词之间的字符（“”）

Answer 3

I would remove , from bannedWord list and use str.strip to strip it:我会从bannedWord列表中删除,并使用str.strip将其剥离：

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

Prints:印刷：

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

Answer 4

The below seems to work (a naive nested loop)下面似乎工作（一个天真的嵌套循环）

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output output

['lemons', 'cheddar cheese', 'carrots']

遍历字符串列表，从每个字符串项中删除所有禁用词

问题描述

4 个解决方案

解决方案1
2 已采纳 2022-08-13 16:01:51

解决方案2
0 2022-08-13 15:58:43

解决方案3
0 2022-08-13 16:00:01

解决方案4
0 2022-08-13 16:06:14

遍历字符串列表，从每个字符串项中删除所有禁用词

问题描述

4 个解决方案

解决方案1 2 已采纳 2022-08-13 16:01:51

解决方案2 0 2022-08-13 15:58:43

解决方案3 0 2022-08-13 16:00:01

解决方案4 0 2022-08-13 16:06:14

解决方案1
2 已采纳 2022-08-13 16:01:51

解决方案2
0 2022-08-13 15:58:43

解决方案3
0 2022-08-13 16:00:01

解决方案4
0 2022-08-13 16:06:14