简体繁体中英

How to share a single GPU deep learning server?

原文 2020-12-17 17:00:18 3 1 python/ tensorflow/ server/ deep-learning/ architecture

For our development team we want to build a central GPU server for their deep learning / training tasks (with one or more strong GPU(s) instead of mulitple workstations for each team member with their own GPU). I guess this is a common setup, but I am not sure how to make this GPU sharing work for multiple team members simultaneously. We work with Tensorflow/Keras and Python scripts.

My question is: What is the typical approach to let team members train their models on that central server? Just allow them to access via SSH and do network training directly from command line? Or setup a Jupyter Hub server, so that our developers can run code from their browser?

My main question: If there is only one GPU, how can we make sure that multiple users cannot run their code (ie train their networks) at the same time? Is there a way to kind of submit training jobs on a central server software and those are executed on the GPU one after the other?

(Sorry if this is not the correct site to ask this question, but which other Stack Exchange site would be better?)

1 answers

Even though we don't need this setup any more, one option to solve this is via a workload manager like slurm . There is also GPU management available.

Is there a way for a single GPU and model to run deep learning model prediction/inference in parallel

Running GPU to compute without Deep Learning Framework

GPU in docker container for deep learning task

Do we need a GPU system to train an deep learning model?

Is there a way to get GPU acceleration on Raspberry Pi 4 for Deep Learning?

Deep learning script detecting GPU after a very long time

How to select a deep learning algorithm

How to run Machine Learning algorithms in GPU

How to apply some features into a deep learning model?

How to improve accuracy and validation accuracy in deep learning

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Is there a way for a single GPU and model to run deep learning model prediction/inference in parallel Running GPU to compute without Deep Learning Framework GPU in docker container for deep learning task Do we need a GPU system to train an deep learning model? Is there a way to get GPU acceleration on Raspberry Pi 4 for Deep Learning? Deep learning script detecting GPU after a very long time How to select a deep learning algorithm How to run Machine Learning algorithms in GPU How to apply some features into a deep learning model? How to improve accuracy and validation accuracy in deep learning

Related Tags

How to share a single GPU deep learning server?

Question

1 answers

solution1 0 ACCPTED 2022-06-21 16:18:56

solution1
0 ACCPTED 2022-06-21 16:18:56