简体   繁体   中英

Reinforcement learning, pendulum python

I'm having trouble finding a good reward function for the pendulum problem, the function I'm using: -x ** 2 + - 0.25 * (xdot ** 2) which is the quadratic error from the top. with x representing the current location of the pendulum and xdot the angular velocity.

its takes a lot of time with this function and sometimes doesn't work. any one have some other suggestions? I've been looking in google but didn't find anything i could use

In this paper , the authors perform different experiments in a simulated and a real version of the inverted pendulum with the following reward function: 在此处输入图片说明

here, x is the state vector representing the current angle and angular velocity, and u is the action.

Experiments show that the reward function works reasonably well using the following algorithms: SARSA, LSPI, experience replay SARSA and experience replay Q-learning.

However, take into account that your problem maybe is not (only) related with the reward function, since the speed of convergence can be affect by many factors, as suggested by @Matheus Portela in the comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM