简体   繁体   中英

Replace a word by a word in a string

I have a dictionary like below

word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}

and I have a string like this:

data = "It's winter not summer. Have a nice day"

What I want to do is to replace the word a by a1 , winter by cold , etc in the data . I did try to use the below code:

for word in word_dict:
    data = data.replace(word, word_dict[word])

But it fails because it replaces the substring (substring of the data , not the word). Infact, the word Have is replace by Ha1ve .

The result should be:

data = "It's cold not hot. Have a1 nice day"

You could use re.sub . \\b word boundary which matches between a word character and a non-word character. We need to use word boundary to match an exact word string or otherwise, it would match also the a in day

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)


>>> data
"It's cold not hot. Have a1 nice day"

there are multiple ways to achieve this, apart from regular expressions:

ldata = data.split(' ') #splits by whitespace characters
res = []
for i in ldata:
    if i in word_dict:
        res.append(word_dict[i])
    else:
        res.append(i)
final = ' '.join(res)

regular expression solution is more practical and fit to what you want, but list.split() and string.join() methods come in handy sometimes. :)

Use split with dict.get and split on " " to keep the correct spacing:

from string import punctuation

print(" ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")]))
It's cold not hot. Have a1 nice day

We also need to strip punctuation so summer. matches summer etc...

Some timings show even with splitting and stripping the non regex approach is still over twice as fast:

In [18]: %%timeit                                                              data = "It's winter not summer. Have a nice day"
for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)
   ....: 
100000 loops, best of 3: 12.2 µs per loop

In [19]: timeit " ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")])
100000 loops, best of 3: 5.52 µs per loop

You can use a generator inside the join() function :

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> ' '.join(word_dict[j] if j in word_dict else j for j in data.split())
"It's cold not summer. Have a1 nice day"

with splitting the data you can search in its words then use a simple comprehension to replace the specific words .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM