[英]Loop through list of strings, remove all banned words from each string item
I have the following list:我有以下列表:
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
This is a list of words that I want to remove from each of the string items in the list:这是我要从列表中的每个字符串项中删除的单词列表:
bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']
The resulting list that I am trying to generate is this:我试图生成的结果列表是这样的:
cleaner_list = ["lemons", "cheddar cheese", "carrots"]
So far, I have been unable to achieve this.到目前为止,我一直无法做到这一点。 My attempt is as follows:
我的尝试如下:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
return pattern.sub("", ing)
for ing in dirtylist:
cleaner_ing = RemoveBannedWords(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
This returns:这将返回:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
I have also tried:我也试过:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)
def remove_words(ing):
global re_banned_words
return re_banned_words.sub("", ing)
for ing in dirtylist:
cleaner_ing = remove_words(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
This returns:这将返回:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
I'm a bit lost at this point and not sure where I'm going wrong.在这一点上我有点迷茫,不知道我哪里出错了。 Any help is much appreciated.
任何帮助深表感谢。
Some issues:一些问题:
The final \W
in your regex requires that there is a character that follows the banned word.正则表达式中的最后一个
\W
要求在禁用词之后有一个字符。 So if the banned word is the last word in the input string, that will fail.因此,如果禁用词是输入字符串中的最后一个词,那将失败。 You could just use
\b
again, like you did at the start of the regex您可以再次使用
\b
,就像您在正则表达式开始时所做的那样
Since you wanted to replace the comma as well, you need to add it as an option.由于您也想替换逗号,因此您需要将其添加为选项。 Make sure to not put it inside that same capture group, as then
\\b
at the end would require that comma to be followed by an alphanumerical character.确保不要将它放在同一个捕获组中,因为最后的
\\b
将要求逗号后跟一个字母数字字符。 So it should be put as an option right at the very end (or start) of your regex.所以它应该作为一个选项放在你的正则表达式的最后(或开始)。
You might want to call .strip()
on the resulting string to remove any white space that remains after the banned words have been removed.您可能希望在结果字符串上调用
.strip()
以删除在删除禁用词后剩余的任何空白。
So:所以:
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
return pattern.sub("", ing).strip()
def clearList(dirtyList, bannedWords, splitChar):
clean = []
for dirty in dirtyList:
ban = False
for w in dirty.split():
if w in bannedWords:
ban = True
if ban is False:
clean.append(dirty)
return clean
dirtyList is list that you will clear dirtyList 是您将清除的列表
bannedWords are words that you dont want禁止的词是你不想要的词
splitChar is charcther that is between the words (" ") splitChar 是单词之间的字符(“”)
I would remove ,
from bannedWord
list and use str.strip
to strip it:我会从
bannedWord
列表中删除,
并使用str.strip
将其剥离:
import re
dirtylist = [
"lemons zested",
"grated cheddar cheese",
"carrots, thinly chopped",
]
bannedWord = ["grated", "zested", "thinly", "chopped"]
pat = re.compile(
r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)
for w in dirtylist:
print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))
Prints:印刷:
lemons zested lemons
grated cheddar cheese cheddar cheese
carrots, thinly chopped carrots
The below seems to work (a naive nested loop)下面似乎工作(一个天真的嵌套循环)
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
temp = words
for bannedWord in bannedWords:
temp = temp.replace(bannedWord, '')
result.append(temp.strip())
print(result)
output output
['lemons', 'cheddar cheese', 'carrots']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.