简体   繁体   English

使用字典和replace()函数替换字符串中的单词的问题

[英]Issues with replacing words in a string using a dictionary and the replace() function

Say I have a dictionary, a string and a list of the words in that string. 假设我有一个字典,一个字符串和该字符串中的单词列表。 Like this: 像这样:

the_dictionary={'mine': 'yours', 'I': 'you', 'yours': 'mine', 'you': 'I'}

the_string='I thought that was yours'

list_string=['I','thought','that','was','yours']

This is my code: 这是我的代码:

for word in list_string:            
        if word in the_dictionary:
            the_string=the_string.replace(word,the_dictionary[word],1)
print(the_string)

input : I thought that was yours 输入 :我以为那是你的

Output : you thought that was mine 输出 :您认为那是我的

Here everything works great, but if I change the input to: 在这里一切正常,但是如果我将输入更改为:

the_string="That is mine that is yours" the_string =“那是我的,属于你的”

Input: That is mine that is yours 输入:那是你的

Output: That is mine that is yours 输出:那是你的

Nothing changes. 没有什么变化。

Obviously it has something to do with the fact that they are a key-value pair but my hope is that this can be solved somehow. 显然,它们是一个键值对,但我希望这可以通过某种方式解决。

My question : Why does this happen and can it be fixed? 我的问题 :为什么会这样,并且可以解决?

Please keep in mind that I am still sort of a beginner and would appreciate it if you could pretend I am child while explaining it. 请记住,我仍然是一个初学者,如果您能在解释它的时候假装我还是个孩子,将不胜感激。

Thanks for taking the time /wazus 感谢您抽出宝贵的时间/ wazus

The issue is that you are calling replace on the_string each time, and when called with the optional argument, replace replaces the first occurrences of the source string. 问题是您每次都在the_string上调用replace ,并在使用可选参数调用时, replace替换源字符串的第一个匹配项。

So, the first time you encounter mine in list_string , the_string gets changed to That is yours that is yours . 因此,第一次在list_string遇到minelist_stringthe_string会更改为That is yours that is yours list_string So far, this is what is expected. 到目前为止,这是预期的。

But later, you encounter yours in list_string , and you say the_string = the_string.replace('yours', 'mine', 1) . 但是稍后,您在list_string遇到了yours ,然后说了the_string = the_string.replace('yours', 'mine', 1) So, the first occurrence of yours in the_string gets replaced with mine , which brings us back to the original string. 所以,第一次出现yoursthe_string被替换为mine ,这使我们又回到原来的字符串。

Here's one way to fix it: 这是修复它的一种方法:

In [78]: the_string="That is mine that is yours"

In [79]: the_dictionary={'mine': 'yours', 'I': 'you', 'yours': 'mine', 'you': 'I'}

In [80]: list_string = the_string.split()

In [81]: for i,word in enumerate(list_string):
    if word in the_dictionary:
        list_string[i] = the_dictionary[word]
   ....:         

In [82]: print(' '.join(list_string))
That is yours that is mine

Here's what's happening in your second exemple. 这是第二个示例中发生的事情。 Originally, you have : 最初,您有:

the_string = "That is mine, that is yours"

Your script changes the first "mine" into "yours" which gives : 您的脚本将第一个“我的”更改为“您的”,从而得到:

the_string = "That is yours, that is yours"

Then, when scanning the string again, it changes BACK the first "yours" (which was just changed !) back to "mine", giving you the original phrase again : 然后,当再次扫描字符串时,它将第一个“ yours”(刚刚更改!)改回“ mine”,再次为您提供原始短语:

the_string = "That is mine, that is yours"

Well, then : why didn't it do the same for the first string ? 那么,为什么:为什么它对第一个字符串没有做同样的事情? Because it depends on which order it will pick the words in your dictionary, and there's no way to decide that. 因为它取决于在字典中选择单词的顺序,所以无法确定。 Sometimes you will get lucky and it will work, sometimes not. 有时候,您会很幸运,并且会奏效,有时却不会。

First, you want to make sure that once a word is changed, it doesn't get changed back again. 首先,您要确保一个词一旦更改,就不会再次变回原样。 So, from the structure of your original script, it's better to change the list than the string. 因此,从原始脚本的结构来看,更改列表比字符串更好。 You enumerate each item in the list, if the item is in the dictionary KEYS (yup : you should always look for the keys, not for the word themselves) you change it. 您会枚举列表中的每个项目,如果该项目在字典KEYS中(是的:您应始终寻找关键字,而不是单词本身),则进行更改。 Then you change back the list into a string : 然后,将列表改回字符串:

the_dictionary = {'I': 'you', 'mine': 'yours','yours': 'mine', 'you': 'I'}

the_string1 = 'I thought that was yours'
the_string2 = 'That is mine that is yours'


list_string1 = ['I','thought','that','was','yours']
list_string2 = ['Thas','is','mine','thas','is','yours']


for i,word in enumerate(list_string1) :
    if word in the_dictionary.keys():
        list_string1[i] = the_dictionary[word]
the_string1 = "%s "*len(list_string1) % tuple(list_string1)

for i,word in enumerate(list_string2) :
    if word in the_dictionary.keys() :
        list_string2[i] = the_dictionary[word]
the_string2 = "%s "*len(list_string2) % tuple(list_string2)

print(the_string1)
print(the_string2)

I used enumerate() which makes it easier to access both the index and the item of a list. 我使用了enumerate(),它使访问索引和列表项更加容易。 Then I used a little trick to change the list back into a string. 然后,我使用了一个小技巧将列表改回字符串。 Not sure it's the best way... Of course, the better way would be to wrap all that up into a function. 不确定这是最好的方法...当然,更好的方法是将所有内容包装到一个函数中。 You can even change the string to a list with the regular expression module : 您甚至可以使用正则表达式模块将字符串更改为列表:

import re
the_string_list = re.findall(r'\w+',the_string)

Hope it helps ! 希望能帮助到你 !

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM