[英]Replace all the occurrences of specific words
Suppose that I have the following sentence: 假设我有以下句子:
bean likes to sell his beans
and I want to replace all occurrences of specific words with other words. 我想用其他单词替换所有出现的特定单词。 For example,
bean
to robert
and beans
to cars
. 例如,
bean
到robert
和beans
到cars
。
I can't just use str.replace
because in this case it'll change the beans
to roberts
. 我不能只使用
str.replace
因为在这种情况下它会将beans
更改为roberts
。
>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
I need to change the whole words only, not the occurrences of the word in the other word. 我只需改变整个单词,而不是另一个单词中出现的单词。 I think that I can achieve this by using regular expressions but don't know how to do it right.
我认为我可以通过使用正则表达式实现这一点,但不知道如何正确执行。
If you use regex, you can specify word boundaries with \\b
: 如果使用正则表达式,则可以使用
\\b
指定单词边界:
import re
sentence = 'bean likes to sell his beans'
sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'
Here 'beans' is not changed (to 'roberts') because the 's' on the end is not a boundary between words: \\b
matches the empty string, but only at the beginning or end of a word. 这里'beans'没有改变(到'roberts'),因为末尾的's'不是单词之间的边界:
\\b
匹配空字符串,但只在单词的开头或结尾。
The second replacement for completeness: 第二次替换完整性:
sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'
If you replace each word one at a time, you might replace words several times (and not get what you want). 如果您一次更换一个单词,您可能会多次替换单词(而不是得到您想要的单词)。 To avoid this, you can use a function or lambda:
为避免这种情况,您可以使用函数或lambda:
d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)
That way, once bean
is replaced by robert
, it won't be modified again (even if robert
is also in your input list of words). 这样,一旦
bean
被robert
取代,它就不会被再次修改(即使robert
也在你输入的单词列表中)。
As suggested by georg , I edited this answer with dict.get(key, default_value)
. 正如georg所建议的那样,我用
dict.get(key, default_value)
编辑了这个答案。 Alternative solution (also suggested by georg ): 替代解决方案(也由georg建议):
str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in)
"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")
Will replace all instances of "beans" with "cars" and "bean" with "robert". 将所有“beans”实例替换为“car”,将“bean”替换为“robert”。 This works because
.replace()
returns a modified instance of original string. 这是有效的,因为
.replace()
返回原始字符串的修改实例。 As such, you can think of it in stages. 因此,您可以分阶段思考它。 It essentially works this way:
它基本上以这种方式工作:
>>> first_string = "bean likes to sell his beans"
>>> second_string = first_string.replace("beans", "cars")
>>> third_string = second_string.replace("bean", "robert")
>>> print(first_string, second_string, third_string)
('bean likes to sell his beans', 'bean likes to sell his cars',
'robert likes to sell his cars')
I know its been a long time but Does this look much more elegant? 我知道它已经很久了,但这看起来更优雅吗? :
:
reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.