Tag[policy-gradient-descent] Recent Newest Questions

Parallel environments in Pong keep ending up in the same state despite random actions being taken

Hi I am trying to use the SubprocVecEnv to run 8 parallel Pong environment instances. I tried testing the state transitions using random actions but a ...

python policy gradient reinforcement learning with continous action space is not working

i am trying to learn an agent to navigate to a target in my custom environment. The agent is learning with a neural net (2 hidden Dense layer, one dro ...

Action masking for continuous action space in reinforcement learning

Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems oft ...

PyTorch PPO implementation for Cartpole-v0 getting stuck in local optima

. Answers to this question are eligible for a +50 reputation bounty. 20 ...

DDPG Actor Update ( Pytorch Implementation Issus )

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written t ...

ValueError: No gradients provided for any variable in policy gradient

I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided ...

DDPG not converging for a simple control problem

I am trying to solve a control problem with DDPG. The problem is simple enough so that I can do value function iteration for its discretized version, ...

MlpPolicy only return 1 and -1 with action spece[-1,1]

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] ...

PPO2 reinforcement learning 'catastrophic forgetting'?

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured th ...

How to solve the zero probability problem in the policy gradient?

Recently, I have tried to apply the naive policy gradient method to my problem. However, I found that the difference between different outputs of the ...

What Loss Or Reward Is Backpropagated In Policy Gradients For Reinforcement Learning?

I have made a small script in Python to solve various Gym environments with policy gradients. import gym, os import numpy as np #create environment e ...

Reward not increasing while training a Bipedal System

I am completely new to reinforcement learning and this is my first program in practice. I am trying to train the bipedal system in the OpenAI gym envi ...

PPO algorithm converges on only one action

I have taken some reference implementations of PPO algorithm and am trying to create an agent which can play space invaders . Unfortunately from the 2 ...

What is the meaning of paramaterized policy in Reinforcement learning?

Policy is simply mapping of state to actions How is it paramaterized ? Can someone explain . ...

Can the output of DDPG policy network be a probability distribution instead of a certain action value?

We know that DDPG is a deterministic policy gradient method and the output of its policy network should be a certain action. But once I tried to let t ...

Policy gradient (REINFORCE) diverging when finding the shortest path in a graph with negative rewards

I want to use the policy gradient to find the shortest path among a group of nodes in a network. The network is represented using a graph with edges l ...

How do you evaluate a trained reinforcement learning agent whether it is trained or not?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So m ...

Difficult reinforcement learning query

I'm struggling to figure out how I want to do this so I hope someone here may offer some guidance. Scenario - I have a 10 character string, lets call ...

Loss Policy Gradient - Reinforcement Learning

I am training my network using policy gradient and defining the loss as: What I do not understand is that the loss function is sometimes positive o ...

How do we assess each reward in the return in Policy Gradient Methods?

Hi StackOverflow Community, I have a problem with the policy gradient methods in reinforcement learning. In policy gradient methods, we increase/de ...