Question
How do I specify the correct address of Dask workers on a remote resource to a Dask scheduler running locally?
Situation
I have a remote resource I can ssh into. There, I have a docker container that runs an image containing all the dependencies I need to run Dask, Distributed.
When run, the container executes the following:
dask-worker --nprocs 14 --nthreads 1 {inet_addr_local}:878
In the same network, but on my laptop, I run another container of the same image. In this container, I run the Dask scheduler, like so:
dask-scheduler --port 8786
When I start up the scheduler, everything is fine. When I start up the container of workers, it seems to connect to the scheduler. In the status I see the following:
Waiting to connect to: tcp://{this_matches_inet_address_of_local}:8786
On the scheduler, I see the following logged repeatedly, in a loop as it continually tries to contact/respond to each of the workers:
distributed.scheduler - INFO - Remove worker tcp://172.18.0.10:41508
distributed.scheduler - INFO - Removed worker tcp://172.18.0.10:41508
distributed.scheduler - ERROR - Failed to connect to worker 'tcp://172.18.0.10:44590': Timed out trying to connect to 'tcp://172.18.0.10:44590' after 3 s: OSError: [Errno 113] No route to host
The issue (I think) can be seen here. tcp://172.18.0.10
is incorrect. The workers on running on a resource db.foo.net
that I can ssh into via me@db.foo.net
.
From the scheduler container, I can see that I am able to ping db.foo.net
successfully. I think that the workers are assuming their address is the local address for the container they are in, and not db.foo.net
. I need to override this default as some sort of configuration for the workers. I thought --host
tag would do it, but that causes Tornado to throw the following error: OSError: [Errno 99] Cannot assign requested address
.
Dask workers need to be able to contact the scheduler with the address given to them. It sounds like this isn't happening for you. This could be for many reasons associated to your network. A couple of possibilities:
Unfortunately there isn't much that Dask itself can do to help you identify these network issues. You might try running other services on the relevant ports and seeing if you can recreate the lack of connectivity with common tools like ping
or python -m http.serve --port 8786
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.