简体   繁体   English

如何将多个 GPU 用于多个协同工作的模型?

[英]How to use multiple GPUs for multiple models that work together?

I have three models defined under different device scopes in tensorflow and I'm using GradientTape to train these networks.我在 tensorflow 的不同设备范围下定义了三个模型,我正在使用 GradientTape 来训练这些网络。 When I do this the memory increases by a few hundred megabytes to show that the model has loaded in the respective GPUs.当我这样做时,内存会增加几百兆字节,以表明模型已加载到相应的 GPU 中。 The problem is that when I start to train, even with a very small batch size, only the GPU @ position 0 memory increases.问题是当我开始训练时,即使批量很小,也只有 GPU @ position 0 内存增加。 I'm using GradientTape to do the training process as well.我也在使用 GradientTape 进行训练过程。 Is there any way to ensure that only the GPUs assigned to the models are used for that model?有没有办法确保只有分配给模型的 GPU 用于该模型?

with tf.device('/device:GPU:0'):
    model1 = model1Class().model()

with tf.device('/device:GPU:1'):
    model2 = model2Class().model()

with tf.device('/device:GPU:2'):
    model3 = model3Class().model()


for epoch in range(10):
    dataGen = DataGenerator(...)
    X, y = next(dataGen)

    with tf.GradientTape() as tape1:
         X = model1(X)
         loss1 = lossFunc(X, y[1])
    grads1 = suppressionTape.gradient(tape1,model1.trainable_weights)
    optimizer1.apply_gradients(zip(model1.trainable_weights))

    with tf.GradientTape() as tape2:
         X = model2(X)          # Uses output from model2
         loss2 = lossFunc(X, y[2])
    grads2 = suppressionTape.gradient(tape2,model2.trainable_weights)
    optimizer2.apply_gradients(zip(model2.trainable_weights))

    with tf.GradientTape() as tape3:
         X = model3(X)          # Uses output from model3
         loss3 = lossFunc(X, y[3])
    grads3 = suppressionTape.gradient(tape3,model3.trainable_weights)
    optimizer3.apply_gradients(zip(model3.trainable_weights))

I must admit that I have been searching a bit to give you a correct solution to your problem.我必须承认,我一直在寻找为您提供解决问题的正确方法。 It seems that the answer to your question resides in here (the credits go to Laplace Ricky):似乎您的问题的答案就在这里(感谢拉普拉斯·瑞奇(Laplace Ricky)):

@Laplace Ricky: It is supposed to run in single gpu (probably the first gpu, GPU:0) for any codes that are outside of mirrored_strategy.run(). @Laplace Ricky:对于 mirrored_strategy.run() 之外的任何代码,它应该在单个 gpu(可能是第一个 gpu,GPU:0)中运行。 Also, as you want to have the gradients returned from replicas, mirrored_strategy.gather() is needed as well.此外,由于您希望从副本返回梯度,因此还需要 mirrored_strategy.gather()。

Besides these, a distributed dataset must be created by using mirrored_strategy.experimental_distribute_dataset.除此之外,必须使用 mirrored_strategy.experimental_distribute_dataset 创建分布式数据集。 Distributed dataset tries to distribute single batch of data across replicas evenly.分布式数据集尝试在副本之间均匀分布单批数据。 An example about these points is included below.以下是关于这些要点的示例。

model.fit(), model.predict(),and etc... run in distributed manner automatically just because they've already handled everything mentioned above for you. model.fit()、model.predict() 等...以分布式方式自动运行,因为它们已经为您处理了上面提到的所有内容。

See this thread here: Tensorflow - Multi-GPU doesn't work for model(inputs) nor when computing the gradients .请在此处查看此线程: Tensorflow - Multi-GPU 不适用于模型(输入)或计算梯度

You need to use mirrored_strategy.experimental_distribute_dataset(dataset) and adapt the code to your needs.您需要使用mirrored_strategy.experimental_distribute_dataset(dataset)并根据您的需要调整代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM