简体繁体 English

强化学习 DQN 环境结构

[英]Reinforcement learning DQN environment structure

原文 2021-02-02 10:41:21 7 1 python/ deep-learning/ reinforcement-learning/ dqn

I am wondering how best to feed back the changes my DQN agent makes on its environment, back to itself.我想知道如何最好地将我的 DQN 代理对其环境所做的更改反馈给它自己。

I have a battery model whereby an agent can observe a time-series forecast of 17 steps, and 5 features.我有一个电池 model ，代理可以观察到 17 个步骤和 5 个特征的时间序列预测。 It then makes a decision on whether to charge or discharge.然后它决定是充电还是放电。

I want to includes its current state of charge (empty, half full, full etc) in its observation space (ie somewhere within the (17,5) dataframes I am feeding it).我想将其当前的 state 充电（空、半满、满等）包含在其观察空间中（即我正在输入的 (17,5) 数据帧内的某个位置）。

I have several options, I can either set a whole column to the state of charge value, a whole row, or I can flatten the whole dataframe and set one value to the state of charge value.我有几个选项，我可以将一整列设置为电荷值的 state，整行，或者我可以将整个 dataframe 展平并将一个值设置为电荷值的 Z9ED39E2EA931586B6A985A6942EF5。

Is any of these unwise?这些是不明智的吗？ It seem a little rudimentary to me to set a whole columns to a single value, but should it actually impact performance?将整个列设置为单个值对我来说似乎有点初级，但它真的会影响性能吗？ I am wary of flattening the whole thing as I plan to use either conv or lstm layers (although the current model is just dense layers).当我计划使用 conv 或 lstm 层时，我对将整个事物展平持谨慎态度（尽管当前的 model 只是密集层）。

1 个解决方案

You would not want to add in unnecessary features which are repetitive in the state representation as it might hamper your RL agent convergence later when you would want to scale your model to larger input sizes(if that is in your plan).您不想添加在 state 表示中重复的不必要的功能，因为它可能会妨碍您稍后将 model 扩展到更大的输入大小（如果这是您的计划）。

Also, the decision of how much of information you would want to give in the state representation is mostly experimental.此外，您希望在 state 表示中提供多少信息的决定主要是实验性的。 The best way to start would be to just give in a single value as the battery state.最好的开始方法是只给出一个值作为电池 state。 But if the model does not converge, then maybe you could try out the other options you have mentioned in your question.但是，如果 model 不收敛，那么也许您可以尝试您在问题中提到的其他选项。