简体   繁体   English

替换所有特定单词的出现

[英]Replace all the occurrences of specific words

Suppose that I have the following sentence: 假设我有以下句子:

bean likes to sell his beans

and I want to replace all occurrences of specific words with other words. 我想用其他单词替换所有出现的特定单词。 For example, bean to robert and beans to cars . 例如, beanrobertbeanscars

I can't just use str.replace because in this case it'll change the beans to roberts . 我不能只使用str.replace因为在这种情况下它会将beans更改为roberts

>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'

I need to change the whole words only, not the occurrences of the word in the other word. 我只需改变整个单词,而不是另一个单词中出现的单词。 I think that I can achieve this by using regular expressions but don't know how to do it right. 我认为我可以通过使用正则表达式实现这一点,但不知道如何正确执行。

If you use regex, you can specify word boundaries with \\b : 如果使用正则表达式,则可以使用\\b指定单词边界:

import re

sentence = 'bean likes to sell his beans'

sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'

Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \\b matches the empty string, but only at the beginning or end of a word. 这里'beans'没有改变(到'roberts'),因为末尾的's'不是单词之间的边界: \\b匹配空字符串,但在单词的开头或结尾。

The second replacement for completeness: 第二次替换完整性:

sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'

If you replace each word one at a time, you might replace words several times (and not get what you want). 如果您一次更换一个单词,您可能会多次替换单词(而不是得到您想要的单词)。 To avoid this, you can use a function or lambda: 为避免这种情况,您可以使用函数或lambda:

d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)

That way, once bean is replaced by robert , it won't be modified again (even if robert is also in your input list of words). 这样,一旦beanrobert取代,它就不会被再次修改(即使robert也在你输入的单词列表中)。

As suggested by georg , I edited this answer with dict.get(key, default_value) . 正如georg所建议的那样,我用dict.get(key, default_value)编辑了这个答案。 Alternative solution (also suggested by georg ): 替代解决方案(也由georg建议):

str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)
"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")

Will replace all instances of "beans" with "cars" and "bean" with "robert". 将所有“beans”实例替换为“car”,将“bean”替换为“robert”。 This works because .replace() returns a modified instance of original string. 这是有效的,因为.replace()返回原始字符串的修改实例。 As such, you can think of it in stages. 因此,您可以分阶段思考它。 It essentially works this way: 它基本上以这种方式工作:

 >>> first_string = "bean likes to sell his beans"
 >>> second_string = first_string.replace("beans", "cars")
 >>> third_string = second_string.replace("bean", "robert")
 >>> print(first_string, second_string, third_string)

 ('bean likes to sell his beans', 'bean likes to sell his cars', 
  'robert likes to sell his cars')

I know its been a long time but Does this look much more elegant? 我知道它已经很久了,但这看起来更优雅吗? :

reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM