简体繁体中英

Implementing Dask scheduler and workers on Docker containers

原文 2021-04-07 09:40:43 2 1 python/ docker/ dask/ parquet/ dask-distributed

I need to run a scikit-learn RandomForestClassifier with multiple processes in parallel. For that, I'm looking into implementing a Dask scheduler with N workers, where the scheduler and each worker run in a separate Docker container.

The client application, that also runs in a separate Docker container, will first connect to the scheduler and initiate the scikit-learn process with with joblib.parallel_backend('dask'): .

The data to train the machine learning model is stored in parquet in the client application Docker container. What is the best practice to have the workers access the data? Should the data be located somewhere else, in a shared directory?

1 answers

Since Apach Parquet is a files-system based, it all depends on the architecture that you are building, meaning will your project run on a single server or it will be distributed across multiple servers.

If you are running on a single server, then simple share of the docker volume between the containers, or even a common mount over the local file storage will do the job.

If, on the other hand, you are trying to set up a distributed training over multiple servers, then you will need some type of file server to handle the files.

One of the simple ways to share files is trough an NFS server and one of the commonly used images for this is erichough/nfs-server . You will need to use this container to export the local folder(s) where the files are stored and you will need to mount the fs on the remaining servers.

How to change the dask scheduler and workers?

How to terminate workers started by dask multiprocessing scheduler?

Shutdown dask workers from client or scheduler

Use Flask's “app” singleton to Dask Scheduler/Workers

how does dask distribute data to workers from the scheduler?

Local Dask scheduler failing to connect to workers on remote resource

Celery Flower with Multiple Workers in Different Docker Containers

DASK workers with different walltimes

Dask workers not at 100%

Dask: handling unresponsive workers

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to change the dask scheduler and workers? How to terminate workers started by dask multiprocessing scheduler? Shutdown dask workers from client or scheduler Use Flask's “app” singleton to Dask Scheduler/Workers how does dask distribute data to workers from the scheduler? Local Dask scheduler failing to connect to workers on remote resource Celery Flower with Multiple Workers in Different Docker Containers DASK workers with different walltimes Dask workers not at 100% Dask: handling unresponsive workers

Related Tags

Implementing Dask scheduler and workers on Docker containers

Question

1 answers

solution1 1 ACCPTED 2021-04-07 14:27:21

solution1
1 ACCPTED 2021-04-07 14:27:21