[英]Train multiple keras/tensorflow models on different GPUs simultaneously
I would like to train multiple models on multiple GPUs at the simultaneously from within a jupyter notebook.我想在 jupyter notebook 中同时在多个 GPU 上训练多个模型。 I am working on a node with 4GPUs.我正在处理一个带有 4GPU 的节点。 I would like to assign one GPU to one model and train 4 different models at the same time.我想将一个 GPU 分配给一个模型并同时训练 4 个不同的模型。 Right now, I select a GPU for one notebook by (eg):现在,我通过(例如)为一台笔记本选择了 GPU:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
def model(...):
....
model.fit(...)
In four different notebooks.在四个不同的笔记本中。 Though, then the results and the output of the fitting procedure is distributed in four different notebooks.但是,拟合过程的结果和输出分布在四个不同的笔记本中。 Though, running them in one notebook sequentially, needs a lot of time.但是,按顺序在一个笔记本中运行它们需要很多时间。 How do you assign GPU's to individual functions and run them in parallel?您如何将 GPU 分配给各个功能并并行运行它们?
I recommend using Tensorflow scopes like so:我建议像这样使用 Tensorflow 范围:
with tf.device_scope('/gpu:0'):
model1.fit()
with tf.device_scope('/gpu:1'):
model2.fit()
with tf.device_scope('/gpu:2'):
model3.fit()
If you want to train models on different cloud GPUs (eg GPU instances from AWS), try this library:如果你想在不同的云 GPU 上训练模型(例如来自 AWS 的 GPU 实例),试试这个库:
!pip install aibro==0.0.45 --extra-index-url https://test.pypi.org/simple
from aibro.train import fit
machine_id = 'g4dn.4xlarge' #instance name on AWS
job_id, trained_model, history = fit(
model=model,
train_X=train_X,
train_Y=train_Y,
validation_data=(validation_X, validation_Y),
machine_id=machine_id
)
Tutorial: https://colab.research.google.com/drive/19sXZ4kbic681zqEsrl_CZfB5cegUwuIB#scrollTo=ERqoHEaamR1Y教程: https : //colab.research.google.com/drive/19sXZ4kbic681zqEsrl_CZfB5cegUwuIB#scrollTo=ERqoHEaamR1Y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.