简体   繁体   English

遍历字符串列表,从每个字符串项中删除所有禁用词

[英]Loop through list of strings, remove all banned words from each string item

I have the following list:我有以下列表:

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

This is a list of words that I want to remove from each of the string items in the list:这是我要从列表中的每个字符串项中删除的单词列表:

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

The resulting list that I am trying to generate is this:我试图生成的结果列表是这样的:

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

So far, I have been unable to achieve this.到目前为止,我一直无法做到这一点。 My attempt is as follows:我的尝试如下:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

This returns:这将返回:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I have also tried:我也试过:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

This returns:这将返回:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I'm a bit lost at this point and not sure where I'm going wrong.在这一点上我有点迷茫,不知道我哪里出错了。 Any help is much appreciated.任何帮助深表感谢。

Some issues:一些问题:

  • The final \W in your regex requires that there is a character that follows the banned word.正则表达式中的最后一个\W要求在禁用词之后有一个字符。 So if the banned word is the last word in the input string, that will fail.因此,如果禁用词是输入字符串中的最后一个词,那将失败。 You could just use \b again, like you did at the start of the regex您可以再次使用\b ,就像您在正则表达式开始时所做的那样

  • Since you wanted to replace the comma as well, you need to add it as an option.由于您也想替换逗号,因此您需要将其添加为选项。 Make sure to not put it inside that same capture group, as then \\b at the end would require that comma to be followed by an alphanumerical character.确保不要将它放在同一个捕获组中,因为最后的\\b将要求逗号后跟一个字母数字字符。 So it should be put as an option right at the very end (or start) of your regex.所以它应该作为一个选项放在你的正则表达式的最后(或开始)。

  • You might want to call .strip() on the resulting string to remove any white space that remains after the banned words have been removed.您可能希望在结果字符串上调用.strip()以删除在删除禁用词后剩余的任何空白。

So:所以:

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()
def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList is list that you will clear dirtyList 是您将清除的列表

bannedWords are words that you dont want禁止的词是你不想要的词

splitChar is charcther that is between the words (" ") splitChar 是单词之间的字符(“”)

I would remove , from bannedWord list and use str.strip to strip it:我会从bannedWord列表中删除,并使用str.strip将其剥离:

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

Prints:印刷:

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

The below seems to work (a naive nested loop)下面似乎工作(一个天真的嵌套循环)

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output output

['lemons', 'cheddar cheese', 'carrots']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从列表中存在的字符串中删除所有单词 - Remove all words from a string that exist in a list 循环遍历字符串列表并删除特定列表项 (i) 之后的所有项,其中 i 可以是 a 或 b 或 c - Loop through a list of strings and remove all items after a specific list item(i) where i can be a or b or c Python - 从列表中的每个字符串中删除目标词 - Python - Remove target words from each string in a list 如何从字符串列表中删除单词列表 - How to remove list of words from a list of strings 如何从字符串列表中删除单词列表? - How to remove list of words from a list of strings? Python函数从字符串列表中的字符串项中删除一些内容 - Python function to remove some content from string item in a list of strings 从python列表中删除字符串中所有出现的单词 - Remove all occurrences of words in a string from a python list 如何通过函数运行日期字符串列表并将每个项目的结果作为一个串联字符串返回? - How can I run a list of date strings through a function and return the results of each item as one concatonated string? Pandas:从大型数据集中 dataframe 字符串中的特定列表中删除所有单词 - Pandas: Remove all words from specific list within dataframe strings in large dataset 如何使用列表中的每个项目仅循环一次列表以在​​Python中的另一个列表的开头插入? - How to loop through a list using each item from a list only once to insert at the beginning of another list in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM