简体   繁体   中英

Using a Q-learning model without external libraries

I am trying to use reinforcement learning on a Pacman based game. I want to use Q-learning techniques to generate my agent's actions. I was planning on using openai-gym and keras libraries to train my model, but I was hoping there was a way to save my model and use it without the open-gym and keras libraries (after it has been trained). From what I understand the Q-learning could be used to generate state-action pairs and I was wondering if it was possible to save all possible combinations of these for a solvable system like Pacman. This seems somewhat unrealistic, so if you have any other ideas I would love to hear them.

From your question, it seems like you have a model of the world (Pacman based games), and want to train a Q-learning algorithm to solve the environment. After completing, you want to save the model.

To save the model, it depends entirely on what RL algorithm you are using. And, of course, all of them can be saved, or it would be useless in the real world.

Tabular RL : Tabular Q-learning basically stores the policy (Q-values) of the agent into a matrix of shape (S x A), where s are all states, a are all the possible actions. After the environment is solved, just save this matrix as a csv file. I have a quick implementation of this on my GitHub under Reinforcement Learning.

Linear RL : If the state space and/or action space is too large, you can use function approximation. In this case, you build a linear model that approximates the Q-matrix. To save this model, you simply have to save the weights of this linear model as a csv, or even text file.

Deep RL : Same as linear RL. You would just have to save the weights of the neural network(s). If you coded the network yourself, it should be trivial to save it as a csv file. If you are using tensorflow, you can make checkpoints by:

saver = tf.train.Saver()  # Builds the saver

where ever the end of your training is, put:

saver.save(sess, model_path)

I have an implementation of this as well for deep deterministic policy gradient on my GitHub.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM