繁体   English   中英

卷积神经网络:如何训练? (无监督)

[英]Convolutional neural network: how to train it? (unsupervised)

我正在尝试实现一个 CNN 来玩游戏。 我正在将 python 与 theano/lasagne 一起使用。 我已经建立了网络,现在正在研究如何训练它。

所以现在我有一批 32 个状态,对于该批次中的每个状态,该动作和该动作预期奖励

现在我如何训练网络,使其了解这些状态中的这些动作会导致这些奖励?

编辑:澄清我的问题。

这是我的完整代码: http : //pastebin.com/zY8w98Ng蛇导入: http : //pastebin.com/fgGCabzR

我在这方面遇到了麻烦:

def _train(self):
    # Prepare Theano variables for inputs and targets
    input_var = T.tensor4('inputs')
    target_var = T.ivector('targets')
    states = T.tensor4('states')
    print "sampling mini batch..."
    # sample a mini_batch to train on
    mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE)
    # get the batch variables
    previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch]
    actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch]
    rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch]
    current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch])
    agents_expected_reward = []
    # print np.rollaxis(current_states, 3, 1).shape
    print "compiling current states..."
    current_states = np.rollaxis(current_states, 3, 1)
    current_states = theano.compile.sharedvalue.shared(current_states)

    print "getting network output from current states..."
    agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states)


    print "rewards adding..."
    for i in range(len(mini_batch)):
        if mini_batch[i][self.OBS_TERMINAL_INDEX]:
            # this was a terminal frame so need so scale future reward...
            agents_expected_reward.append(rewards[i])
        else:
            agents_expected_reward.append(
                rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval()))

    # figure out how to train the model (self._output_layer) with previous_states,
    # actions and agent_expected_rewards

我想使用 previous_states、actions 和 agent_expected_rewards 更新模型,以便它了解这些操作会导致这些奖励。

我希望它看起来像这样:

train_model = theano.function(inputs=[input_var],
    outputs=self._output_layer,
    givens={
        states: previous_states,
        rewards: agents_expected_reward
        expected_rewards: agents_expected_reward)

我只是不明白这些给定会如何影响模型,因为在构建网络时我没有指定它们。 我也无法在 theano 和 lasagne 文档中找到它。

那么我如何更新模型/网络以使其“学习”。

如果仍然不清楚,请评论还需要哪些信息。 几天来我一直在努力解决这个问题。

在浏览了文档后,我终于找到了答案。 我之前找错地方了。

    network = self._output_layer
    prediction = lasagne.layers.get_output(network)
    loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
    loss = loss.mean()

    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE)
    givens = {
        states: current_states,
        expected: agents_expected_reward,
        real_rewards: rewards
    }
    train_fn = theano.function([input_var, target_var], loss,
                                    updates=updates, on_unused_input='warn',
                                    givens=givens,
                                    allow_input_downcast='True')
    train_fn(current_states, agents_expected_reward)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM