簡體   English   中英

遍歷字符串列表,從每個字符串項中刪除所有禁用詞

[英]Loop through list of strings, remove all banned words from each string item

我有以下列表:

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

這是我要從列表中的每個字符串項中刪除的單詞列表:

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

我試圖生成的結果列表是這樣的:

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

到目前為止,我一直無法做到這一點。 我的嘗試如下:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

這將返回:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

我也試過:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

這將返回:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

在這一點上我有點迷茫,不知道我哪里出錯了。 任何幫助深表感謝。

一些問題:

  • 正則表達式中的最后一個\W要求在禁用詞之后有一個字符。 因此,如果禁用詞是輸入字符串中的最后一個詞,那將失敗。 您可以再次使用\b ,就像您在正則表達式開始時所做的那樣

  • 由於您也想替換逗號,因此您需要將其添加為選項。 確保不要將它放在同一個捕獲組中,因為最后的\\b將要求逗號后跟一個字母數字字符。 所以它應該作為一個選項放在你的正則表達式的最后(或開始)。

  • 您可能希望在結果字符串上調用.strip()以刪除在刪除禁用詞后剩余的任何空白。

所以:

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()
def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList 是您將清除的列表

禁止的詞是你不想要的詞

splitChar 是單詞之間的字符(“”)

我會從bannedWord列表中刪除,並使用str.strip將其剝離:

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

印刷:

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

下面似乎工作(一個天真的嵌套循環)

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output

['lemons', 'cheddar cheese', 'carrots']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM