简体   繁体   English

使用Multi-gpu时成员变量的Pytorch错误值

[英]Pytorch incorrect value of member variable when using Multi-gpu

Here is a simple class for running in multi-gpu environment. 这是一个在多GPU环境中运行的简单类。 The member variable self.firstIter should be False after the first iteration. 第一次迭代后,成员变量self.firstIter应该为False

Class TestNetwork(nn.Module):

    def __init__(self):
        super(TestNetwork, self).__init__()
        self.firstIter = True #indicates whether it's the first iteration

    def forward(self, input):
        print 'is firstIter: ', self.firstIter #always True!!
        if self.firstIter is True:
            self.firstIter = False
        # do otherthings

The code works as expected when using only one gpu. 仅使用一个GPU时,代码可以按预期工作。

However when using multi-gpu (ie nn.DataParallel ), the value of self.firstIter is always printed as True . 然而,使用多GPU(即,当nn.DataParallel ),的值self.firstIter总是打印为True

Why does this happen? 为什么会这样? What is wrong with the code? 代码有什么问题?

Using PyTorch version 0.3.1. 使用PyTorch版本0.3.1。

Basically, DataParallel operates on model replicas, and changes made to replicas (during forward) are not visible outside forward/backward calls if number of divices is large than 1. 基本上,DataParallel对模型副本进行操作,并且如果副本数大于1,则在向前/向后调用之外看不到对副本所做的更改(在向前过程中)。

Plz refer to https://discuss.pytorch.org/t/nonetype-attribute-when-using-dataparallel/11566 for details. 请参阅https://discuss.pytorch.org/t/nonetype-attribute-when-using-dataparallel/11566了解详情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM