简体繁体 English

rllib 中的“num_envs_per_worker”有什么作用？

[英]What does "num_envs_per_worker" in rllib do?

原文 2020-03-03 21:22:41 2 1 python/ ray/ rllib

For the life of me I don't get what "num_envs_per_worker" does.对于我的生活，我不明白“num_envs_per_worker”的作用。 If the limiting factor is policy evaluation why would we need to create multiple environments?如果限制因素是策略评估，为什么我们需要创建多个环境？ Wouldn't we need to create multiple policies?我们不需要创建多个策略吗？

ELI5 please?请问ELI5？

The docs say:文档说：

Vectorization within a single process: Though many envs can achieve high frame rates per core, their throughput is limited in practice by policy evaluation between steps.单个进程中的矢量化：尽管许多环境可以实现每个核心的高帧速率，但它们的吞吐量在实践中受到步骤之间的策略评估的限制。 For example, even small TensorFlow models incur a couple milliseconds of latency to evaluate.例如，即使是很小的 TensorFlow 模型也会产生几毫秒的延迟来进行评估。 This can be worked around by creating multiple envs per process and batching policy evaluations across these envs.这可以通过为每个进程创建多个 envs并跨这些 envs 批处理策略评估来解决。 You can configure {"num_envs_per_worker": M} to have RLlib create M concurrent environments per worker.你可以配置 {"num_envs_per_worker": M} 来让 RLlib 为每个 worker 创建 M 个并发环境。 RLlib auto-vectorizes Gym environments via VectorEnv.wrap(). RLlib 通过 VectorEnv.wrap() 自动矢量化 Gym 环境。

Src: https://ray.readthedocs.io/en/latest/rllib-env.html源代码： https : //ray.readthedocs.io/en/latest/rllib-env.html

1 个解决方案

Probably a bit late on this, but here's my understanding:可能有点晚了，但这是我的理解：

as the docs you cited mention, there's significant fixed per-call overhead in using TensorFlow (converting data into the appropriate structures, overhead and coordination of passing data to the GPU, etc)正如您引用的文档所提到的，使用 TensorFlow 存在大量固定的每次调用开销（将数据转换为适当的结构、开销和将数据传递到 GPU 的协调等）
however, you can call a TensorFlow model with a batch of data, and the execution time required generally scales nicely.但是，您可以使用一批数据调用 TensorFlow 模型，并且所需的执行时间通常可以很好地扩展。 It should scale linearly at the limit, and when going from a single row to a few rows, it might actually scale sub-linearly.它应该在极限处线性扩展，当从单行扩展到几行时，它实际上可能是亚线性扩展的。 Eg if you are going to pass 1 row of data to a vector processing unit like a GPU (or specialised CPU instructions), you might as well pass as many rows as it can handle it one go, it won't actually take any more time.例如，如果您要将 1 行数据传递给像 GPU（或专门的 CPU 指令）这样的向量处理单元，您最好一次传递尽可能多的行，它实际上不会再花更多的时间时间。 (those parallel execution units would just have been sitting idle otherwise) （否则那些并行执行单元将一直处于空闲状态）
therefore, you want to batch up rows of data so that you only pay the fixed per-call cost as infrequently as necessary.因此，您希望对数据行进行批处理，以便在必要时仅支付固定的每次调用成本。 One way of doing this is by having several RL environments executing in lockstep.一种方法是让多个 RL 环境以锁步方式执行。 Maybe you have 8 of them, and each of these 8 environments produces its own observation, and then you take these 8 observations and call your TensorFlow model once on this batch of 8 observations, to produce 8 new actions, which you then use to produce 8 new observations, etc. Amortised, this will hopefully be only 1/8th the TensorFlow evaluation cost if each of these environments was making its own TensorFlow calls.也许你有 8 个，这 8 个环境中的每一个都会产生自己的观察结果，然后你把这 8 个观察结果调用一次你的 TensorFlow 模型在这批 8 个观察结果上，产生 8 个新动作，然后你用它来产生8 个新的观察等。如果这些环境中的每一个都进行自己的 TensorFlow 调用，那么这有望仅是 TensorFlow 评估成本的 1/8。