简体   繁体   中英

Updating an old system to Q-learning with Neural Networks

Recently I've been reading a lot about Q-learning with Neural Networks and thought about to update an existing old optimization system in a power plant boiler composed of a simple feed-forward neural network approximating an output from many sensory inputs. The output then is linked to a linear model-based controller that somehow output again an optimal action so the whole model can converge to a desired goal.

Identifying linear models is a consuming task. I thought about refurbishing the whole thing to model- free Q-learning with a Neural Network approximation of the Q-function. I drew a diagram to ask you if I'm on the right track or not.

模型

My question: if you think I understood well the concept, should my training set be composed of State Features vectors from one side and Q_target - Q_current (here I'm assuming there's an increasing reward) in order to force the whole model towards the target or am I missing something?

Note: The diagram shows a comparison between the old system in the upper part and my proposed change on the lower part.

EDIT: Does a State Neural Network guarantee Experience Replay?

You might just use all the Q value of all the actions in the current state as the output layer in your network. A poorly drawn diagram is here

You can therefore take advatange of NN's ability to output multiple Q value at a time. Then, just back prop using loss derived by Q(s, a) <- Q(s, a) + alpha * (reward + discount * max(Q(s', a')) - Q(s, a) , where max(Q(s', a')) can be easily computed from the output layer.

Please let me know if you have further questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM