python dict中的Unicoded字符串键错误

Question

I have such a code: 我有这样的代码：

corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()

corpus = []
for a in range(0, len(corpus_file), 2):
     corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})

params = {}

for sentencePair in corpus:
     for tgtWord in sentencePair['tgt']:
          for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

Basically I am trying to create a dictionary of dictionary of float. 基本上，我正在尝试创建float字典。 But I get the following error: 但是我收到以下错误：

Traceback (most recent call last):
  File "initial_guess.py", line 15, in <module>
    params[srcWord][tgtWord] = 1.0
KeyError: u'A'

UTF-8 string as key in dictionary causes KeyError UTF-8字符串作为字典中的键会导致KeyError

I checked the case above, but it doesn't help. 我检查了上述情况，但这没有帮助。

Basically I don't understand why unicoded string 'A' is not allowed in python to be a key value? 基本上我不明白为什么python中不允许未编码的字符串'A'作为键值？ Is there any way to fix it? 有什么办法可以解决？

Answer 1

Your params dict is empty. 您的params字典是空的。

You can use tree for that: 您可以为此使用树：

from collections import defaultdict

def tree():
    return defaultdict(tree)

params = tree()
params['any']['keys']['you']['want'] = 1.0

Or a simpler defaultdict case without tree : 或更简单的没有tree defaultdict情况：

from collections import defaultdict

params = defaultdict(dict)

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

If you don't want to add anything like that, then just try to add dict to params on every iteration: 如果您不想添加这样的内容，那么只需在每次迭代中将dict添加到params ：

params = {}

for sentencePair in corpus:
    for srcWord in sentencePair['src']:
        params.setdefault(srcWord, {})
        for tgtWord in sentencePair['tgt']:  
               params[srcWord][tgtWord] = 1.0

Please note, that I've changed the order of for loops, because you need to know srcWord first. 请注意，我已经更改了for循环的顺序，因为您首先需要了解srcWord 。

Otherwise you need to check key existence too often: 否则，您需要经常检查密钥的存在：

params = {}

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
            params.setdefault(srcWord, {})[tgtWord] = 1.0

Answer 2

You can just use setdefault : 您可以只使用setdefault ：

Replace 更换

params[srcWord][tgtWord] = 1.0

with 与

params.setdefault(srcWord, {})[tgtWord] = 1.0

python dict中的Unicoded字符串键错误

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-10-04 10:13:23

解决方案2
1 2016-10-04 10:39:07

python dict中的Unicoded字符串键错误

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-10-04 10:13:23

解决方案2 1 2016-10-04 10:39:07

解决方案1
2 已采纳 2016-10-04 10:13:23

解决方案2
1 2016-10-04 10:39:07