简体   繁体   English

使用元组作为字典键

[英]Using tuples as dictionary keys

The commented block of code below outputs my desired answer whilst the uncommented block outputs the wrong answers.下面带注释的代码块输出我想要的答案,而未注释的代码块输出错误的答案。

Could somebody shed light on why the two blocks of code are different?有人可以解释为什么这两个代码块不同吗? The keys to self.q should be (state, action) pairs, so why does self.q[state][action] work? self.q 的键应该是 (state, action) 对,那么 self.q[state][action] 为什么会起作用呢? Shouldn't self.q only accept one key? self.q 不应该只接受一个键吗?

    def update_q_value(self, state, action, old_q, reward, future_rewards):
        # Q-values are stored in the dictionary self.q. The keys of self.q should be in the form of (state, action) pairs, where state is a tuple of all piles sizes in order, and action is a tuple (i, j) representing a pile and a number.

        state_pair = (tuple(state), action)
        if state_pair not in self.q:
            self.q[state_pair] = dict()

        print(old_q + self.alpha * (reward + future_rewards - old_q))

        self.q[state_pair] = old_q + self.alpha * (reward + future_rewards - old_q)

        # state = tuple(state)
        # if state not in self.q:
        #     self.q[state] = dict()

        # print(old_q + self.alpha * (reward + future_rewards - old_q))

        # self.q[state][action] = old_q + self.alpha * (reward + future_rewards - old_q)

Output from the first block looks like this:第一个块的 Output 如下所示:

Playing training game 1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.5
0.5
Playing training game 2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.75
0.75
...
Playing training game 9999
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-1.0
1.0
Playing training game 10000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-1.0
1.0

Output from the second block looks like this:第二块的 Output 如下所示:

Playing training game 1
0.0
0.0
0.0
0.0
0.0
0.0
-0.5
0.5
Playing training game 2
0.0
0.0
0.0
0.0
0.0
0.0
-0.25
-0.5
0.5
...
Playing training game 9999
0.0625
0.125
0.125
0.125
0.25
0.25
-0.25
-0.5
0.5
Playing training game 10000
0.0625
0.125
0.125
0.125
0.25
0.25
-0.25
-0.5
0.5

The full code is here if anyone is willing to look at it: https://d.pr/n/MKE8iH It can be run with something like:如果有人愿意看的话,完整的代码在这里: https://d.pr/n/MKE8iH它可以用类似的东西运行:

ai = train(10000)
play(ai)

As mentioned in the comments, self.q[state][action] works because you are creating another dictionary as value, which has action as key.正如评论中提到的, self.q[state][action]之所以有效,是因为您正在创建另一个字典作为值,它以action作为键。

class foo():
    def __init__(self):
        self.qTuple = {}
        self.qDict = {}

    def update_q_value_tuple(self, state, action, value):
        state_pair = (tuple(state), action)
        if state_pair not in self.qTuple:
            self.qTuple[state_pair] = dict()
        self.qTuple[state_pair] = value


    def update_q_value_dict(self, state, action, value):
        state = tuple(state)
        if state not in self.qDict:
            self.qDict[state] = dict()
        self.qDict[state][action] = value


f = foo()
states = ['foo', 'bar']
actions = ['hold', 'release']

for s in states:
    for a in actions:
        for v in range(0, 5):
            f.update_q_value_tuple(s, a, v)
            f.update_q_value_dict(s, a, v)

print f.qTuple
print f.qDict

Output: Output:

{(('f', 'o', 'o'), 'hold'): 4, (('b', 'a', 'r'), 'hold'): 4, (('b', 'a', 'r'), 'release'): 4, (('f', 'o', 'o'), 'release'): 4}
{('f', 'o', 'o'): {'release': 4, 'hold': 4}, ('b', 'a', 'r'): {'release': 4, 'hold': 4}}

Note, you need to be careful when creating a tuple with one element dont forget to a trailing comma: state = tuple(state, )请注意,创建具有一个元素的元组时需要小心,不要忘记尾随逗号: state = tuple(state, )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM