简体繁体中英

Evaluate_policy records much higher mean reward then stable baselines 3 logger

原文 2023-01-21 12:01:15 9 1 reinforcement-learning/ stable-baselines

As the title says, I am testen PPO with the Cartpole Environment using SB3, but if I look at the performance measured be the evaluate_policy function I reach a mean reward of 475 reliable at 20000 timesteps, but I need about 90000 timesteps if I look at console log to get comparable results during learning.

Why does my model perform so much better using the evaluation helper?

I used the same hyperparameters in both cases, and I used a new environment for the evaluation with the helper method.

1 answers

I think I have solved the "problem": evaluate_policy uses deterministic action in it's default settings, which leads to better results faster.

Why is the Stable-Baselines3 evaluate_policy() function never finishing/completing?

LSTM based policy in stable baselines3 model

stable_baselines module error -> 'gym.logger' has no attribute 'MIN_LEVEL'

train stable baselines 3 with examples?

stable baselines action space

GNN with Stable baselines

How to evaluate q-value network of sac agent in stable baselines (on a state-action pair)?

Stable Baselines: DQN Not performing properly?

Stable baselines saving PPO model and retraining it again

What is the defualt architecture for an MlpLnLstmPolicyin stable-baselines?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why is the Stable-Baselines3 evaluate_policy() function never finishing/completing? LSTM based policy in stable baselines3 model stable_baselines module error -> 'gym.logger' has no attribute 'MIN_LEVEL' train stable baselines 3 with examples? stable baselines action space GNN with Stable baselines How to evaluate q-value network of sac agent in stable baselines (on a state-action pair)? Stable Baselines: DQN Not performing properly? Stable baselines saving PPO model and retraining it again What is the defualt architecture for an MlpLnLstmPolicyin stable-baselines?

Related Tags

Evaluate_policy records much higher mean reward then stable baselines 3 logger

Question

1 answers

solution1 0 2023-01-21 23:58:45

solution1
0 2023-01-21 23:58:45