[英]How to terminate workers started by dask multiprocessing scheduler?
After using the dask multiprocessing scheduler for a long period of time, I noticed that the python processes started by the multiprocessing scheduler take a lot of memory.长时间使用dask多处理调度器后,发现多处理调度器启动的python进程占用了大量内存。 How can I restart the worker pool?
如何重新启动工作池?
Update: You can do this to kill the workers started by the multiprocessing scheduler:更新:您可以这样做来杀死由多处理调度程序启动的工作程序:
from dask.context import _globals pool = _globals.pop('pool') # remove the pool from globals to make dask create a new one pool.close() pool.terminate() pool.join()
First answer:第一个回答:
For tasks that consume a lot of memory, I prefer to use the distributed
scheduler even in localhost.对于消耗大量内存的任务,我更喜欢在 localhost 中使用
distributed
调度程序。
It's very straightforward:这非常简单:
$ dask-scheduler distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Scheduler at: 1.2.3.4:8786 distributed.scheduler - INFO - http at: 1.2.3.4:9786 distributed.bokeh.application - INFO - Web UI: http://1.2.3.4:8787/status/ distributed.scheduler - INFO - ----------------------------------------------- distributed.core - INFO - Connection from 1.2.3.4:56240 to Scheduler distributed.core - INFO - Connection from 1.2.3.4:56241 to Scheduler distributed.core - INFO - Connection from 1.2.3.4:56242 to Scheduler
$ dask-worker --nprocs 8 --nthreads 1 --memory-limit .8 1.2.3.4:8786 distributed.nanny - INFO - Start Nanny at: 127.0.0.1:61760 distributed.nanny - INFO - Start Nanny at: 127.0.0.1:61761 distributed.nanny - INFO - Start Nanny at: 127.0.0.1:61762 distributed.nanny - INFO - Start Nanny at: 127.0.0.1:61763 distributed.worker - INFO - Start worker at: 127.0.0.1:61765 distributed.worker - INFO - nanny at: 127.0.0.1:61760 distributed.worker - INFO - http at: 127.0.0.1:61764 distributed.worker - INFO - Waiting to connect to: 127.0.0.1:8786 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.nanny - INFO - Start Nanny at: 127.0.0.1:61767 distributed.worker - INFO - Memory: 1.72 GB distributed.worker - INFO - Local Directory: /var/folders/55/nbg15c6j4k3cg06tjfhqypd40000gn/T/nanny-11ygswb9 ...
distributed.Client
class to submit your jobs.distributed.Client
类提交您的作业。In [1]: from distributed import Client In [2]: client = Client('1.2.3.4:8786') In [3]: client <Client: scheduler="127.0.0.1:61829" processes=8 cores=8> In [4]: from distributed.diagnostics import progress In [5]: import dask.bag In [6]: data = dask.bag.range(10000, 8) In [7]: data dask.bag In [8]: future = client.compute(data.sum()) In [9]: progress(future) [########################################] | 100% Completed | 0.0s In [10]: future.result() 49995000
I found out this way more reliable than the default scheduler.我发现这种方式比默认调度程序更可靠。 I prefer explicitly submit the task and handle the future to use the progress widget, which is really nice in a notebook.
我更喜欢明确提交任务并处理未来以使用进度小部件,这在笔记本中非常好。 Also you can still do stuff while waiting the results.
此外,您仍然可以在等待结果的同时做一些事情。
If you get errors due to memory issues, you can restart the workers or the scheduler (start all over again), use smaller chunks of data and try again.如果由于内存问题而出现错误,您可以重新启动工作程序或调度程序(重新开始),使用较小的数据块并重试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.