[英]Loop through list of strings, remove all banned words from each string item
我有以下列表:
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
這是我要從列表中的每個字符串項中刪除的單詞列表:
bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']
我試圖生成的結果列表是這樣的:
cleaner_list = ["lemons", "cheddar cheese", "carrots"]
到目前為止,我一直無法做到這一點。 我的嘗試如下:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
return pattern.sub("", ing)
for ing in dirtylist:
cleaner_ing = RemoveBannedWords(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
這將返回:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
我也試過:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)
def remove_words(ing):
global re_banned_words
return re_banned_words.sub("", ing)
for ing in dirtylist:
cleaner_ing = remove_words(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
這將返回:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
在這一點上我有點迷茫,不知道我哪里出錯了。 任何幫助深表感謝。
一些問題:
正則表達式中的最后一個\W
要求在禁用詞之后有一個字符。 因此,如果禁用詞是輸入字符串中的最后一個詞,那將失敗。 您可以再次使用\b
,就像您在正則表達式開始時所做的那樣
由於您也想替換逗號,因此您需要將其添加為選項。 確保不要將它放在同一個捕獲組中,因為最后的\\b
將要求逗號后跟一個字母數字字符。 所以它應該作為一個選項放在你的正則表達式的最后(或開始)。
您可能希望在結果字符串上調用.strip()
以刪除在刪除禁用詞后剩余的任何空白。
所以:
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
return pattern.sub("", ing).strip()
def clearList(dirtyList, bannedWords, splitChar):
clean = []
for dirty in dirtyList:
ban = False
for w in dirty.split():
if w in bannedWords:
ban = True
if ban is False:
clean.append(dirty)
return clean
dirtyList 是您將清除的列表
禁止的詞是你不想要的詞
splitChar 是單詞之間的字符(“”)
我會從bannedWord
列表中刪除,
並使用str.strip
將其剝離:
import re
dirtylist = [
"lemons zested",
"grated cheddar cheese",
"carrots, thinly chopped",
]
bannedWord = ["grated", "zested", "thinly", "chopped"]
pat = re.compile(
r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)
for w in dirtylist:
print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))
印刷:
lemons zested lemons
grated cheddar cheese cheddar cheese
carrots, thinly chopped carrots
下面似乎工作(一個天真的嵌套循環)
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
temp = words
for bannedWord in bannedWords:
temp = temp.replace(bannedWord, '')
result.append(temp.strip())
print(result)
output
['lemons', 'cheddar cheese', 'carrots']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.