简体繁体 English

多GPU处理中的本地设备VS非本地设备

[英]Local devices VS non local devices in multi GPU processing

原文 2022-08-23 12:14:11 2 1 deep-learning/ multiprocessing/ gpu/ multi-gpu/ jax

I'm reading JAX documentation on jax.local_devices and in it, it is written:我正在阅读jax.local_devices上的 JAX 文档，其中写道：

Like jax.devices() , but only returns devices local to a given process.像jax.devices()一样，但只返回给定进程的本地设备。

And in jax.devices() it is written:在jax.devices()中写道：

Returns a list of all devices for a given backend.返回给定后端的所有设备的列表。

I don't know what exactly are these local and non-local devices.我不知道这些本地和非本地设备到底是什么。 Could you please elaborate on the difference between these?您能否详细说明这些之间的区别？

1 个解决方案

This is discussed in JAX's documentation in Using JAX in multi-host and multi-process environments :这在在多主机和多进程环境中使用 JAX 中的 JAX 文档中进行了讨论：

A process's local devices are those that it can directly address and launch computations on.进程的本地设备是可以直接寻址和启动计算的设备。 For example, on a GPU cluster, each host can only launch computations on the directly attached GPUs.例如，在 GPU 集群上，每个主机只能在直接连接的 GPU 上启动计算。 On a Cloud TPU pod, each host can only launch computations on the 8 TPU cores attached directly to that host (see the Cloud TPU System Architecture documentation for more details).在 Cloud TPU pod 上，每个主机只能在直接连接到该主机的 8 个 TPU 内核上启动计算（有关更多详细信息，请参阅Cloud TPU 系统架构文档）。 You can see a process's local devices via jax.local_devices() .您可以通过jax.local_devices()查看进程的本地设备。

The global devices are the devices across all processes.全局设备是跨所有进程的设备。 A computation can span devices across processes and perform collective operations via the direct communication links between devices, as long as each process launches the computation on its local devices.只要每个进程在其本地设备上启动计算，计算就可以跨进程跨越设备并通过设备之间的直接通信链路执行集体操作。 You can see all available global devices via jax.devices() .您可以通过jax.devices()查看所有可用的全局设备。 A process's local devices are always a subset of the global devices.进程的本地设备始终是全局设备的子集。