I need to remove a list of words from the values of a specific key in my list of dictionaries.
Here is an example of how my data looks like:
words = ['cloves', 'packed']
data = [{'title': 'Simple Enchiladas Verdes',
'prep_time': '15 min',
'cook_time': '30 min',
'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
'instructions': ['some text...'],
'category': 'dessert',
'cuisine': 'thai',
'article': ['some text...']
},
{...}, {...}]
Desired output:
data = [{'title': 'Simple Enchiladas Verdes',
'prep_time': '15 min',
'cook_time': '30 min',
'ingredients': ['chicken breast', 'tomato sauce', 'garlic', 'fresh cilantro']
},
{...}, {...}]
I have tried different codes:
remove = '|'.join(words)
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)
for dct in data:
dct['ingredients']= list(filter(lambda x: regex.sub('', x), dct['ingredients']))
But this return the following error : TypeError: sub() missing 1 required positional argument: 'string'
Other codes I tried:
for dct in data:
dct['ingredients']= list(filter(lambda x: x != words, dct['ingredients']))
for dct in data:
dct['ingredients']=[[el for el in string if el in words ] for string in dct['ingredients']]
for dct in data:
for string in dct['ingredients']:
dct['ingredients'] = list(filter(lambda x: x not in words, dct['ingredients']))
But none of them resolve my problem.
为什么不使用list
理解和dict
式理解:
data = [{k:([' '.join([s for s in x.split() if s not in words]) for x in v] if k == 'ingredients' else v) for k, v in i.items()} for i in data]
In your re.sub
appraoch, you should use map
, not filter
(you are not filtering out individual words, but replacing the whole string with the result of re.sub
)
for dct in data:
dct['ingredients']= list(map(lambda x: regex.sub('', x), dct['ingredients']))
Or, probably more readable, as a list comprehension:
dct['ingredients'] = [regex.sub("", x) for x in dct['ingredients']]
Both will leave some excess spaces, though. If words are always separated with a space, you can just use split
and join
(faster if words
is a set
):
for dct in data:
dct['ingredients'] = [' '.join(w for w in string.split() if w not in words)
for string in dct['ingredients']]
words = ['cloves', 'packed']
data = [{'title': 'Simple Enchiladas Verdes',
'prep_time': '15 min',
'cook_time': '30 min',
'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']}
]
for i in data:
word = ' @! '.join(i['ingredients'])
for k in words:
word = word.replace(k,'').strip()
i['ingredients']=[i.strip() for i in word.split('@!')]
output
[{'title': 'Simple Enchiladas Verdes',
'prep_time': '15 min',
'cook_time': '30 min',
'ingredients': ['chicken breast',
'tomato sauce',
'garlic',
'fresh cilantro']}]
words = ['cloves', 'packed']
data = [{'title': 'Simple Enchiladas Verdes',
'prep_time': '15 min',
'cook_time': '30 min',
'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
},
{'title': 'Simple Enchiladas Verdes11',
'prep_time': '15 min11',
'cook_time': '30 min11',
'ingredients': ['chicken breast1', '1tomato sauce', '1garlic cloves', '1fresh packed cilantro']}
]
n = []
for d in data:
for item in d['ingredients']:
for word in words:
item = item.replace(word, '')
n.append(item)
d['ingredients'] = n
print (d)
output:
{'title': 'Simple Enchiladas Verdes11', 'prep_time': '15 min11', 'cook_time': '30 min11', 'ingredients': ['chicken breast', 'tomato sauce', 'garlic ', 'fresh cilantro', 'chicken breast1', '1tomato sauce', '1garlic ', '1fresh cilantro']}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.