简体   繁体   中英

Julia cluster using docker

I am trying to connect to docker containers using the default SSHManager. These containers only have a running sshd, with public key authentication, and julia installed.

Here is my dockerfile:

FROM rastasheep/ubuntu-sshd
RUN apt-get update && apt-get install -y julia
RUN mkdir -p /root/.ssh
ADD id_rsa.pub /root/.ssh/authorized_keys

I am running the container using:

sudo docker run -d -p 3333:22 -it --name julia-sshd julia-sshd

And then in the host machine, using the julia repl, I get the following error:

julia> import Base:SSHManager
julia> addprocs(["root@localhost:3333"])
stdin: is not a tty
Worker 2 terminated.
ERROR (unhandled task failure): EOFError: read end of file
Master process (id 1) could not connect within 60.0 seconds.
exiting.

I have tested that I can connect to the container via ssh without password.

I have also tested that in julia repl I can add a regular machine with julia installed to the cluster and it works fine.

But I cannot get this two things working together. Any help or suggestions will be apreciated.

I recommend you to also deploy the Master in a Docker container. It makes your environment easily and fully reproducible.

I'm working on a way of deploying Workers in Docker containers on-demand. ie, the Master deployed in a container can deploy further DockerizedJuliaWorker s. It is similar to https://github.com/gsd-ufal/Infra.jl but assuming that Master and Workers run on the same host, which makes things not so hard.

It is an on-going work and I plan to finish next weeks. In a nutshell:

1) You'll need a simple DockerBackend and a wrapper to transparently run containers, set up SSH, and call addprocs with all the low-level parameters (ie, the DockerizedJuliaWorker.jl file):

https://github.com/NaelsonDouglas/DistributedMachineLearningThesis/tree/master/src/docker

2) Read here how to build the Docker image (Dockerfile is included):

https://github.com/NaelsonDouglas/DistributedMachineLearningThesis

Please tell me if you have any suggestion on how to improve it.

Best,

André Lage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM