简体   繁体   English

用字符串中的单词替换单词

[英]Replace a word by a word in a string

I have a dictionary like below 我有如下字典

word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}

and I have a string like this: 我有一个像这样的字符串:

data = "It's winter not summer. Have a nice day"

What I want to do is to replace the word a by a1 , winter by cold , etc in the data . 我想要做的是在data中将a by a1替换为a by a1 ,将winter by cold替换为winter by cold I did try to use the below code: 我确实尝试使用以下代码:

for word in word_dict:
    data = data.replace(word, word_dict[word])

But it fails because it replaces the substring (substring of the data , not the word). 但是它失败了,因为它替换了子字符串( data的子字符串,而不是单词)。 Infact, the word Have is replace by Ha1ve . 实际上,“ Have ”一词被Ha1ve取代。

The result should be: 结果应为:

data = "It's cold not hot. Have a1 nice day"

You could use re.sub . 您可以使用re.sub \\b word boundary which matches between a word character and a non-word character. \\b单词字符和非单词字符之间匹配的单词边界。 We need to use word boundary to match an exact word string or otherwise, it would match also the a in day 我们需要使用单词边界来匹配确切的单词字符串,否则,它也会匹配daya

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)


>>> data
"It's cold not hot. Have a1 nice day"

there are multiple ways to achieve this, apart from regular expressions: 除了正则表达式外,还有多种方法可以实现此目的:

ldata = data.split(' ') #splits by whitespace characters
res = []
for i in ldata:
    if i in word_dict:
        res.append(word_dict[i])
    else:
        res.append(i)
final = ' '.join(res)

regular expression solution is more practical and fit to what you want, but list.split() and string.join() methods come in handy sometimes. 正则表达式解决方案更实用,并且可以满足您的需要,但是list.split()和string.join()方法有时会派上用场。 :) :)

Use split with dict.get and split on " " to keep the correct spacing: 使用带有dict.get和分裂分裂的" "以保持正确的间距:

from string import punctuation

print(" ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")]))
It's cold not hot. Have a1 nice day

We also need to strip punctuation so summer. summer.我们还需要删除标点符号summer. matches summer etc... 匹配summer等...

Some timings show even with splitting and stripping the non regex approach is still over twice as fast: 一些时间表明,即使拆分和剥离非正则表达式方法,其速度仍然是原来的两倍:

In [18]: %%timeit                                                              data = "It's winter not summer. Have a nice day"
for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)
   ....: 
100000 loops, best of 3: 12.2 µs per loop

In [19]: timeit " ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")])
100000 loops, best of 3: 5.52 µs per loop

You can use a generator inside the join() function : 您可以在join()函数内使用生成器:

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> ' '.join(word_dict[j] if j in word_dict else j for j in data.split())
"It's cold not summer. Have a1 nice day"

with splitting the data you can search in its words then use a simple comprehension to replace the specific words . 通过分割数据,您可以搜索其单词,然后使用简单的理解来替换特定的单词。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM