简体繁体 English

如何终止由 dask 多处理调度程序启动的工作程序？

[英]How to terminate workers started by dask multiprocessing scheduler?

原文 2016-12-13 00:26:52 6 1 python/ dask

After using the dask multiprocessing scheduler for a long period of time, I noticed that the python processes started by the multiprocessing scheduler take a lot of memory.长时间使用dask多处理调度器后，发现多处理调度器启动的python进程占用了大量内存。 How can I restart the worker pool?如何重新启动工作池？

1 个解决方案

Update: You can do this to kill the workers started by the multiprocessing scheduler:更新：您可以这样做来杀死由多处理调度程序启动的工作程序：

from dask.context import _globals
pool = _globals.pop('pool')  # remove the pool from globals to make dask create a new one
pool.close()
pool.terminate()
pool.join()

First answer:第一个回答：

For tasks that consume a lot of memory, I prefer to use the distributed scheduler even in localhost.对于消耗大量内存的任务，我更喜欢在 localhost 中使用distributed调度程序。

It's very straightforward:这非常简单：

Start the scheduler in one shell:在一个 shell 中启动调度程序：

$ dask-scheduler
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO -   Scheduler at:       1.2.3.4:8786
distributed.scheduler - INFO -        http at:       1.2.3.4:9786
distributed.bokeh.application - INFO - Web UI: http://1.2.3.4:8787/status/
distributed.scheduler - INFO - -----------------------------------------------
distributed.core - INFO - Connection from 1.2.3.4:56240 to Scheduler
distributed.core - INFO - Connection from 1.2.3.4:56241 to Scheduler
distributed.core - INFO - Connection from 1.2.3.4:56242 to Scheduler

Start the worker in another shell, you can adjust the parameters accordingly:在另一个shell中启动worker，你可以相应地调整参数：

$ dask-worker  --nprocs 8 --nthreads 1 --memory-limit .8 1.2.3.4:8786
distributed.nanny - INFO -         Start Nanny at:            127.0.0.1:61760
distributed.nanny - INFO -         Start Nanny at:            127.0.0.1:61761
distributed.nanny - INFO -         Start Nanny at:            127.0.0.1:61762
distributed.nanny - INFO -         Start Nanny at:            127.0.0.1:61763
distributed.worker - INFO -       Start worker at:            127.0.0.1:61765
distributed.worker - INFO -              nanny at:            127.0.0.1:61760
distributed.worker - INFO -               http at:            127.0.0.1:61764
distributed.worker - INFO - Waiting to connect to:            127.0.0.1:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.nanny - INFO -         Start Nanny at:            127.0.0.1:61767
distributed.worker - INFO -                Memory:                    1.72 GB
distributed.worker - INFO -       Local Directory: /var/folders/55/nbg15c6j4k3cg06tjfhqypd40000gn/T/nanny-11ygswb9
...

Finally use the distributed.Client class to submit your jobs.最后使用distributed.Client类提交您的作业。

In [1]: from distributed import Client

In [2]: client = Client('1.2.3.4:8786')

In [3]: client
<Client: scheduler="127.0.0.1:61829" processes=8 cores=8>

In [4]: from distributed.diagnostics import progress

In [5]: import dask.bag

In [6]: data = dask.bag.range(10000, 8)

In [7]: data
dask.bag

In [8]: future = client.compute(data.sum())

In [9]: progress(future)
[########################################] | 100% Completed |  0.0s
In [10]: future.result()
49995000

I found out this way more reliable than the default scheduler.我发现这种方式比默认调度程序更可靠。 I prefer explicitly submit the task and handle the future to use the progress widget, which is really nice in a notebook.我更喜欢明确提交任务并处理未来以使用进度小部件，这在笔记本中非常好。 Also you can still do stuff while waiting the results.此外，您仍然可以在等待结果的同时做一些事情。

If you get errors due to memory issues, you can restart the workers or the scheduler (start all over again), use smaller chunks of data and try again.如果由于内存问题而出现错误，您可以重新启动工作程序或调度程序（重新开始），使用较小的数据块并重试。