简体繁体 English

用神经网络进行强化学习

[英]Reinforcement learning with neural networks

原文 2010-05-01 09:56:06 6 1 machine-learning/ neural-network/ reinforcement-learning/ markov

I am working on a project with RL & NN 我正在与RL＆NN进行项目
I need to determine the action vector structure which will be fed to a neural network.. 我需要确定将被馈送到神经网络的动作矢量结构。

I have 3 different actions (A & B & Nothing) each with different powers (eg A100 A50 B100 B50) I wonder what is the best way to feed these actions to a NN in order to yield best results? 我有3种不同的动作（A和B和Nothing），每种动作具有不同的功能（例如A100 A50 B100 B50）。我想知道将这些动作传递给NN以获得最佳结果的最佳方法是什么？

1- feed A/B to input 1, while action power 100/50/Nothing to input 2 1-将A / B馈入输入1，而将动作电源100/50 /无输入2

2- feed A100/A50/Nothing to input 1, while B100/B50/Nothing to input 2 2-将A100 / A50 /无输入1输入，而将B100 / B50 /无输入2输入

3- feed A100/A50 to input 1, while B100/B50 to input 2, while Nothing flag to input 3 3-将A100 / A50馈入输入1，而将B100 / B50馈入输入2，而没有标志输入3

4- Also to feed 100 & 50 or normalize them to 2 & 1 ? 4-还要喂100和50或将它们标准化为2和1吗？

I need reasons why to choose one method Any suggestions are recommended 我需要选择一种方法的理由，建议任何建议

Thanks 谢谢

1 个解决方案

What do you want to learn? 你想学什么？ What should be the output? 输出应该是什么？ Is the input just the used action? 输入只是使用过的动作吗？ If you are learning a model of the environment, it is expressed by a probability distribution: 如果您正在学习环境模型，则可以用概率分布表示：

P(next_state|state, action) P（下一个状态|状态，动作）

It is common to use a separate model for each action. 通常为每个操作使用单独的模型。 That makes the mapping between input and output simpler. 这使得输入和输出之间的映射更加简单。 The input is a vector of state features. 输入是状态特征的向量。 The output is a vector of the features of the next state. 输出是下一个状态的特征的向量。 The used action is implied by the model. 该模型暗含了所使用的动作。

The state features could be encoded as bits. 状态特征可以被编码为比特。 An active bit would indicate the presence of a feature. 活动位将指示功能的存在。

This would learn a deterministic model. 这将学习确定性模型。 I don't know what is a good way to learn a stochastic model of the next states. 我不知道学习下一状态的随机模型的好方法是什么。 One possibility may be to use stochastic neurons. 一种可能是使用随机神经元。