Tag[sarsa] Recent Newest Questions

Problem with Deep Sarsa algorithm which work with pytorch (Adam optimizer) but not with keras/Tensorflow (Adam optimizer)

I have a deep sarsa algorithm which work great on Pytorch on lunar-lander-v2 and I would use with Keras/Tensorflow. It use mini-batch of size 64 which ...

Converting to Python scalars

I am implementing a SARSA reinforcement learning function which chooses an action following the same current policy updates its Q-values. This throws ...

SARSA implementation with tensorflow

I try to learn the concept of reinforcement learning at the moment. Hereby, I tried to implement the SARSA algorithm for the cart pole example using t ...

Teach robot to collect items in grid world before reach terminal state by using reinforcement learning

My problem is the following. I have a simple grid world: https://i.imgur.com/2QyetBg.png The agent starts at the initial state labeled with START, a ...

Eligibility trace algorithm, the update order

I am reading Silver et al (2012) "Temporal-Difference Search in Computer Go", and trying to understand the update order for the eligibility trace algo ...

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can t ...

SARSA value approximation for Cart Pole

I have a question on this SARSA FA. In input cell 142 I see this modified update where q_hat_next is Q(S', a') and q_hat_grad is the derivative o ...

Implementing SARSA in Unity

So I've used following code to implement Q-learning in Unity: Which works fine with my environment. However, I'm also trying to implement SARSA as ...

Why is there no n-step Q-learning algorithm in Sutton's RL book?

I think I am messing something up. I always thought that: - 1-step TD on-policy = Sarsa - 1-step TD off-policy = Q-learning Thus I conclude: - n-st ...

Zeta Variable of SARSA(lamda)

What does zeta represent in the critic method? I believe it keeps track of the state-action pairs and represents eligibility traces, which are a tempo ...

Episodic Semi-gradient Sarsa with Neural Network

While trying to implement the Episodic Semi-gradient Sarsa with a Neural Network as the approximator I wondered how I choose the optimal action based ...

How to prevent the eligibility trace in SARSA with lambda = 1 from exploding for state-action pairs that are visited a huge number of times?

I was testing SARSA with lambda = 1 with Windy Grid World and if the exploration causes the same state-action pair to be visited many times before rea ...

Incorporating Transition Probabilities in SARSA

I am implementing a SARSA(lambda) model in C++ to overcome some of the limitations (the sheer amount of time and space DP models require) of DP models ...

how can get SARSA code for gridworld model in R program?

I have a problem in my study case. I interesting in reinforcement learning for gridworld model. Model is maze of 7x7 fields for movement. Consider a m ...

Understanding linear, gradient-descent Sarsa (based on Sutton & Barto)

I'm trying to implement linear gradient-descent Sarsa based on Sutton & Barto's Book, see the algorithm in the picture below. However, I struggle ...

Sarsa algorithm, why Q-values tend to zero?

I'm trying to implement Sarsa algorithm for solving a Frozen Lake environment from OpenAI gym. I've started soon to work with this but I think I under ...

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into ...

Effect of different epsilon value for Q-learning and SARSA

Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Ql ...

Are Q-learning and SARSA with greedy selection equivalent?

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the ...

Implementing SARSA using Gradient Discent

I have successfully implemented a SARSA algorithm (both one-step and using eligibility traces) using table lookup. In essence, I have a q-value matrix ...