简体   繁体   中英

How to remove words from list of values in specific dictionary key?

I need to remove a list of words from the values of a specific key in my list of dictionaries.

Here is an example of how my data looks like:

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
         'instructions': ['some text...'],
         'category': 'dessert',
         'cuisine': 'thai', 
         'article': ['some text...']
        },
        {...}, {...}]

Desired output:

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic', 'fresh cilantro']
        },
        {...}, {...}]

I have tried different codes:

remove = '|'.join(words)
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)

for dct in data:
    dct['ingredients']= list(filter(lambda x: regex.sub('', x), dct['ingredients']))

But this return the following error : TypeError: sub() missing 1 required positional argument: 'string'

Other codes I tried:

for dct in data:
    dct['ingredients']= list(filter(lambda x: x != words, dct['ingredients']))
for dct in data:
    dct['ingredients']=[[el for el in string if el in words ] for string in dct['ingredients']]
for dct in data:
    for string in dct['ingredients']:
        dct['ingredients'] = list(filter(lambda x: x not in words, dct['ingredients']))

But none of them resolve my problem.

为什么不使用list理解和dict式理解:

data = [{k:([' '.join([s for s in x.split() if s not in words]) for x in v] if k == 'ingredients' else v) for k, v in i.items()} for i in data]

In your re.sub appraoch, you should use map , not filter (you are not filtering out individual words, but replacing the whole string with the result of re.sub )

for dct in data:
    dct['ingredients']= list(map(lambda x: regex.sub('', x), dct['ingredients']))

Or, probably more readable, as a list comprehension:

    dct['ingredients'] = [regex.sub("", x) for x in dct['ingredients']]

Both will leave some excess spaces, though. If words are always separated with a space, you can just use split and join (faster if words is a set ):

for dct in data:
    dct['ingredients'] = [' '.join(w for w in string.split() if w not in words)
                          for string in dct['ingredients']]
words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']}
        ]
for i in data:
    word = ' @! '.join(i['ingredients'])
    for k in words:
        word = word.replace(k,'').strip()

    i['ingredients']=[i.strip() for i in word.split('@!')]

output

[{'title': 'Simple Enchiladas Verdes',
  'prep_time': '15 min',
  'cook_time': '30 min',
  'ingredients': ['chicken breast',
   'tomato sauce',
   'garlic',
   'fresh  cilantro']}]
words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
        },
        {'title': 'Simple Enchiladas Verdes11',
         'prep_time': '15 min11',
         'cook_time': '30 min11',
         'ingredients': ['chicken breast1', '1tomato sauce', '1garlic cloves', '1fresh packed cilantro']}
        ]

n = []
for d in data:
    for item in d['ingredients']:
        for word in words:
            item = item.replace(word, '')
        n.append(item)
    d['ingredients'] = n

print (d)

output:

{'title': 'Simple Enchiladas Verdes11', 'prep_time': '15 min11', 'cook_time': '30 min11', 'ingredients': ['chicken breast', 'tomato sauce', 'garlic ', 'fresh  cilantro', 'chicken breast1', '1tomato sauce', '1garlic ', '1fresh  cilantro']}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM