简体   繁体   English

如何使用字典映射替换字符串中的单词

[英]How to replace words in a string using a dictionary mapping

I have the following sentence 我有以下句子

a = "you don't need a dog"

and a dictionary 和一本字典

dict =  {"don't": "do not" }

But I can't use the dictionary to map words in the sentence using the below code: 但我不能使用字典来使用下面的代码映射句子中的单词:

''.join(str(dict.get(word, word)) for word in a)

Output: 输出:

"you don't need a dog"

What am I doing wrong? 我究竟做错了什么?

Here is one way. 这是一种方式。

a = "you don't need a dog"

d =  {"don't": "do not" }

res = ' '.join([d.get(i, i) for i in a.split()])

# 'you do not need a dog'

Explanation 说明

  • Never name a variable after a class, eg use d instead of dict . 永远不要在课后命名变量,例如使用d而不是dict
  • Use str.split to split by whitespace. 使用str.split按空格分割。
  • There is no need to wrap str around values which are already strings. 没有必要将str包围在已经是字符串的值中。
  • str.join works marginally better with a list comprehension versus a generator expression. 使用列表str.join与生成器表达式相比, str.join效果稍好一些

You need to split(' ') your sentence on ' ' - if you simply iterate over a string, you iterate characters: 你需要split(' ')你的句子上' ' -如果你只是遍历字符串,你遍历字符:

a = "you don't need a dog"

for word in a:  # thats what you are using as input to your dict-key-replace
    print(word) # the single characters are never matched, thats why yours does not work.

Output: 输出:

y
o
u

d
o
n
'
t

n
e
e
d

a

d
o
g

Read How to debug small programs 阅读如何调试小程序

After that, read How to split a string into a list? 之后,阅读如何将字符串拆分为列表? or use jpp's solution. 或者使用jpp的解决方案。

All answers are correct, but in case your sentence is quite long and the mapping-dictionary rather small, you should think of iterating over the items (key-value pairs) of the dictionary and apply str.replace to the original sentence. 所有答案都是正确的,但是如果你的句子很长而且映射字典相当小,你应该考虑迭代字典的项目(键值对)并将str.replace应用于原始句子。

The code as suggested by the others. 其他人建议的代码。 It takes 6.35 µs per loop. 每个循环需要6.35μs

%%timeit

search = "you don't need a dog. but if you like dogs, you should think of getting one for your own. Or a cat?"
mapping =  {"don't": "do not" }

search = ' '.join([mapping.get(i, i) for i in search.split()])

Let's try using str.replace instead. 让我们尝试使用str.replace。 It takes 633 ns per loop. 每个循环需要633 ns

%%timeit 

search = "you don't need a dog. but if you like dogs, you should think of getting one for your own. Or a cat?"
mapping =  {"don't": "do not" }

for key, value in mapping.items():
    search = search.replace(key, value)

And let's use Python3 list comprehension. 让我们使用Python3列表理解。 So we get the fastest version that takes 1.09 µs per loop. 因此,我们获得了每个循环需要1.09μs的最快版本。

%%timeit 

search = "you don't need a dog. but if you like dogs, you should think of getting one for your own. Or a cat?"
mapping =  {"don't": "do not" }

search = [search.replace(key, value) for key, value in mapping.items()][0]

You see the difference? 你看到了区别? For your short sentence the first and the third code are about the same speed. 对于您的短句,第一个和第三个代码的速度大致相同。 But the longer the sentence (search string) gets, the more obvious the difference in performance is. 但句子(搜索字符串)越长,性能差异越明显。

Result string is: 结果字符串是:

'you do not need a dog. “你不需要狗。 but if you like dogs, you should think of getting one for your own. 但如果你喜欢狗,你应该考虑为自己买一只狗。 Or a cat?' 还是一只猫?'

Remark: str.replace would also replace occurrences within long concatenated words. 备注: str.replace也会替换长连接词中的出现次数。 One needs to ensure that replacement is done for full words only. 人们需要确保仅对完整单词进行替换。 I guess there are options for str.replace. 我猜有str.replace的选项。 Another idea is using regular expressions as explained in this posting as they also take care of lower and upper cases. 另一个想法是使用正如这篇文章中解释的正则表达式,因为它们也会处理大小写的情况。 Trailing white spaces in your lookup dictionary won't work since you won't find occurrences at the beginning or on the end of a sentence. 查找字典中的尾随空格不起作用,因为您不会在句子的开头或结尾处找到事件。

It's simpler than you think: 这比你想象的要简单:

Just loop and if there is a word from dict1 then get the value of that key. 只需循环,如果有来自dict1的单词,则获取该键的值。

dict1 =  {"don't": "do not" }

a = "you don't need a dog"

data=a.split()

for i,j in enumerate(data):
    if j in dict1:
        data[i]=dict1[j]
print(" ".join(data))

output: 输出:

you do not need a dog

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM