简体   繁体   English

强化学习中的时间步长

[英]Time step in reinforcement learning

For my first project in reinforcement learning I'm trying to train an agent to play a real time game.在我的第一个强化学习项目中,我试图训练一个智能体来玩实时游戏。 This means that the environment constantly moves and makes changes, so the agent needs to be precise about its timing.这意味着环境会不断移动并发生变化,因此代理需要精确确定其时间。 In order to have a correct sequence, I figured the agent will have to work in certain frequency.为了有一个正确的顺序,我认为代理必须以一定的频率工作。 By that I mean if the agent has 10Hz frequency, it will have to take inputs every 0.1 secs and make a decision.我的意思是,如果代理的频率为 10Hz,它将必须每 0.1 秒接收一次输入并做出决定。 However, I couldn't find any sources on this problem/matter, but it's probably due to not using correct terminology on my searches.但是,我找不到有关此问题/事项的任何来源,但这可能是由于我的搜索没有使用正确的术语。 Is this a valid way to approach this matter?这是处理这个问题的有效方法吗? If so, what can I use?如果是这样,我可以使用什么? I'm working with python3 in windows (the game is only ran in windows), are there any libraries that could be used?我正在 windows 中使用 python3(游戏只能在 windows 中运行),有没有可以使用的库? I'm guessing time.sleep() is not a viable way out, since it isn't very precise (when using high frequencies) and since it just freezes the agent.我猜time.sleep()不是一个可行的方法,因为它不是很精确(当使用高频时)并且因为它只是冻结代理。

EDIT: So my main questions are:编辑:所以我的主要问题是:

a) Should I use a certain frequency, is this a normal way to operate a reinforcement learning agent? a) 我应该使用某个频率,这是操作强化学习代理的正常方式吗?

b) If so what libraries do you suggest? b) 如果是这样,您建议使用哪些库?

There isn't a clear answer to this question, as it is influenced by a variety of factors, such as inference time for your model, maximum accepted control rate by the environment and required control rate to solve the environment.这个问题没有明确的答案,因为它受多种因素的影响,例如模型的推理时间、环境可接受的最大控制率以及解决环境所需的控制率。

As you are trying to play a game, I am assuming that your eventual goal might be to compare the performance of the agent with the performance of a human.当您尝试玩游戏时,我假设您的最终目标可能是将代理的性能与人类的性能进行比较。 If so, a good approach would be to select a control rate that is similar to what humans might use in the same game, which is most likely lower than 10 Hertz.如果是这样,一个好的方法是选择一个与人类在同一游戏中可能使用的控制率相似的控制率,这很可能低于 10 赫兹。

You could try to measure how many actions you use when playing to get a good estimate,您可以尝试测量您在玩游戏时使用了多少动作以获得良好的估计,

However, any reasonable frequency, such as the 10Hz you suggested, should be a good starting point to begin working on your agent.但是,任何合理的频率,例如您建议的 10Hz,都应该是开始处理您的代理的良好起点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM