[英]Pytorch Multi-GPU Issue
I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py
.我想用 2 个 GPU(id 5, 6) 训练我的模型,所以我用
CUDA_VISIBLE_DEVICES=5,6 train.py
运行我的代码。 However, when I printed torch.cuda.current_device I still got the id 0
rather than 5,6.但是,当我打印 torch.cuda.current_device 时,我仍然得到 id
0
而不是 5,6。 But torch.cuda.device_count is 2
, which semms right.但是 torch.cuda.device_count 是
2
,这似乎是正确的。 How can I use GPU5,6 correctly?如何正确使用 GPU5,6?
It is most likely correct.这很可能是正确的。 PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.
PyTorch 只能看到两个 GPU(因此索引为 0 和 1),它们实际上是您的 GPU 5 和 6。
Check the actual usage with nvidia-smi
.使用
nvidia-smi
检查实际使用情况。 If it is still inconsistent, you might need to set an environment variable:如果仍然不一致,您可能需要设置一个环境变量:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName() ) (请参阅'nvidia-smi -L' 和 cuDeviceGetName() 之间的 ID 不一致)
you can check the device name to verify whether that is the correct name of that GPU.您可以检查设备名称以验证它是否是该 GPU 的正确名称。 However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu.
但是,我认为当您将 Cuda_Visible 设置在外面时,您已经迫使手电筒只看那 2 个 gpu。 So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0
所以torch会手动为它们设置index为0和1。因此,当你检查current_device时,它会输出0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.