[英]What does "num_envs_per_worker" in rllib do?
For the life of me I don't get what "num_envs_per_worker" does.对于我的生活,我不明白“num_envs_per_worker”的作用。 If the limiting factor is policy evaluation why would we need to create multiple environments?如果限制因素是策略评估,为什么我们需要创建多个环境? Wouldn't we need to create multiple policies?我们不需要创建多个策略吗?
ELI5 please?请问ELI5?
The docs say:文档说:
Vectorization within a single process: Though many envs can achieve high frame rates per core, their throughput is limited in practice by policy evaluation between steps.单个进程中的矢量化:尽管许多环境可以实现每个核心的高帧速率,但它们的吞吐量在实践中受到步骤之间的策略评估的限制。 For example, even small TensorFlow models incur a couple milliseconds of latency to evaluate.例如,即使是很小的 TensorFlow 模型也会产生几毫秒的延迟来进行评估。 This can be worked around by creating multiple envs per process and batching policy evaluations across these envs.这可以通过为每个进程创建多个 envs并跨这些 envs 批处理策略评估来解决。 You can configure {"num_envs_per_worker": M} to have RLlib create M concurrent environments per worker.你可以配置 {"num_envs_per_worker": M} 来让 RLlib 为每个 worker 创建 M 个并发环境。 RLlib auto-vectorizes Gym environments via VectorEnv.wrap(). RLlib 通过 VectorEnv.wrap() 自动矢量化 Gym 环境。
Src: https://ray.readthedocs.io/en/latest/rllib-env.html源代码: https : //ray.readthedocs.io/en/latest/rllib-env.html
Probably a bit late on this, but here's my understanding:可能有点晚了,但这是我的理解:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.