[英]Replace a word by a word in a string
我有如下字典
word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
我有一個像這樣的字符串:
data = "It's winter not summer. Have a nice day"
我想要做的是在data
中將a by a1
替換為a by a1
,將winter by cold
替換為winter by cold
。 我確實嘗試使用以下代碼:
for word in word_dict:
data = data.replace(word, word_dict[word])
但是它失敗了,因為它替換了子字符串( data
的子字符串,而不是單詞)。 實際上,“ Have
”一詞被Ha1ve
取代。
結果應為:
data = "It's cold not hot. Have a1 nice day"
您可以使用re.sub
。 \\b
單詞字符和非單詞字符之間匹配的單詞邊界。 我們需要使用單詞邊界來匹配確切的單詞字符串,否則,它也會匹配day
的a
>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> for word in word_dict:
data = re.sub(r'\b'+word+r'\b', word_dict[word], data)
>>> data
"It's cold not hot. Have a1 nice day"
除了正則表達式外,還有多種方法可以實現此目的:
ldata = data.split(' ') #splits by whitespace characters
res = []
for i in ldata:
if i in word_dict:
res.append(word_dict[i])
else:
res.append(i)
final = ' '.join(res)
正則表達式解決方案更實用,並且可以滿足您的需要,但是list.split()和string.join()方法有時會派上用場。 :)
使用帶有dict.get和分裂分裂的" "
以保持正確的間距:
from string import punctuation
print(" ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")]))
It's cold not hot. Have a1 nice day
summer.
我們還需要刪除標點符號summer.
匹配summer
等...
一些時間表明,即使拆分和剝離非正則表達式方法,其速度仍然是原來的兩倍:
In [18]: %%timeit data = "It's winter not summer. Have a nice day"
for word in word_dict:
data = re.sub(r'\b'+word+r'\b', word_dict[word], data)
....:
100000 loops, best of 3: 12.2 µs per loop
In [19]: timeit " ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")])
100000 loops, best of 3: 5.52 µs per loop
您可以在join()
函數內使用生成器:
>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> ' '.join(word_dict[j] if j in word_dict else j for j in data.split())
"It's cold not summer. Have a1 nice day"
通過分割數據,您可以搜索其單詞,然后使用簡單的理解來替換特定的單詞。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.