简体   繁体   中英

Reward Logic out of Unity3D in ml-agents package

Unity3D has a package for Reinforcement Learning called ML-agents that I am playing with to understand its components. For my project, I am in the situation that I need to write my own logic to set the reward out of Unity3D (not 'addReward' using C# logic, but write a Python code to set the reward out of Unity).

I wonder if I can use the Python API given by the ML-agents package for using the env observations and update the reward with a custom logic set out of Unity (and send back to Unity)? And where to look for doing so?

In other words (example). In the 3DBall example, a reward logic is set in Unity3D as such if the ball stays on the platform gets a positive reward and if it falls from the platform it receives a negative reward. This logic is implemented in Unity3D by using C# and determine the position of the Ball (vector position) compare to the platform. For every action, the agent calls the env.step(action) and get the tuple of (reward, state...). What if I want to write the logic outside Unity? For example, if I want to write a python program that reads the observation (from Unity3D) and update the reward without using the Unity reward logic? Is this possible? I cannot understand where this option is in the Python API of ML-agents.

At the moment I am thinking to run an external python program in-between the line where I set the reward in C# in Unity3D, but I wonder if this is overcomplicated and that there is an easier solution.

Any help would be really appreciated.

Regards Guido

According to my Reinforcement Learning understanding, the reward is handled by the environment and the agent just get it together with the next observation. You could say it's part of the observation.

Therefore the logic which rewards to get when is part of the environment logic, ie in case of Unity-ML the environment lives in Unity, so you have to implement the reward function in Unity (C#).

So in order to keep the clear separation between environment (Unity) and agent (Python). I think its best to keep the reward logic in Unity/C# and don't tinker with it in Python.

tl;dr: I think it's intended that you cannot set the reward via the Python API to keep a clear environment-agent separation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM