使用Multi-gpu时成员变量的Pytorch错误值

Question

Here is a simple class for running in multi-gpu environment. 这是一个在多GPU环境中运行的简单类。 The member variable self.firstIter should be False after the first iteration. 第一次迭代后，成员变量self.firstIter应该为False 。

Class TestNetwork(nn.Module):

    def __init__(self):
        super(TestNetwork, self).__init__()
        self.firstIter = True #indicates whether it's the first iteration

    def forward(self, input):
        print 'is firstIter: ', self.firstIter #always True!!
        if self.firstIter is True:
            self.firstIter = False
        # do otherthings

The code works as expected when using only one gpu. 仅使用一个GPU时，代码可以按预期工作。

However when using multi-gpu (ie nn.DataParallel ), the value of self.firstIter is always printed as True . 然而，使用多GPU（即，当nn.DataParallel ），的值self.firstIter总是打印为True 。

Why does this happen? 为什么会这样？ What is wrong with the code? 代码有什么问题？

Using PyTorch version 0.3.1. 使用PyTorch版本0.3.1。

Answer 1

Basically, DataParallel operates on model replicas, and changes made to replicas (during forward) are not visible outside forward/backward calls if number of divices is large than 1. 基本上，DataParallel对模型副本进行操作，并且如果副本数大于1，则在向前/向后调用之外看不到对副本所做的更改（在向前过程中）。

Plz refer to https://discuss.pytorch.org/t/nonetype-attribute-when-using-dataparallel/11566 for details. 请参阅https://discuss.pytorch.org/t/nonetype-attribute-when-using-dataparallel/11566了解详情。

使用Multi-gpu时成员变量的Pytorch错误值

问题描述

1 个解决方案

解决方案1
-2 2019-03-24 14:14:54

使用Multi-gpu时成员变量的Pytorch错误值

问题描述

1 个解决方案

解决方案1 -2 2019-03-24 14:14:54

解决方案1
-2 2019-03-24 14:14:54