[英]dask execution gets stuck in LocalCluster
I am using an EC2
VM with 16 cores
and 64GB ram
.我正在使用具有
16 cores
和64GB ram
的EC2
VM。 Wrote a Dask
program that applies a filter on a dataframe and does a concat
with another one and then writes the data back to disk.编写了一个
Dask
程序,该程序在concat
上应用过滤器,并与另一个过滤器进行连接,然后将数据写回磁盘。 If I run it in LocalCluster
mode by calling simply client = Client()
, the execution gets stuck at some point after writing some data.如果我通过简单地调用
client = Client()
在LocalCluster
模式下运行它,则在写入一些数据后执行会卡在某个点。 During this period, the CPU
utilisation is very very low and I can easily understand that nothing is getting executed.在此期间,
CPU
利用率非常低,我可以很容易地理解没有执行任何操作。 Also size of the part files stops increasing at this point.此时零件文件的大小也停止增加。 This goes on forever.
这种情况永远持续下去。 But If I execute it without creating
LocalCluster
, it runs very slowly (low CPU
utilisation) and finishes up the program.但是如果我在不创建
LocalCluster
的情况下执行它,它会运行非常缓慢( CPU
利用率低)并完成程序。 Trying to understand how I can fix this.试图了解如何解决此问题。
Note: Nobody else is using the VM and the data size ranges from 3GB to 25GB.注意:没有其他人在使用虚拟机,数据大小范围从 3GB 到 25GB。
Dask version: 2.15.0 & 2.17.2 Dask 版本:2.15.0 和 2.17.2
Unfortunately there is not enough information in your question to provide a helpful answer.不幸的是,您的问题中没有足够的信息来提供有用的答案。 There are many things that could be going on.
有很多事情可能会发生。
In this situation we recommend watching the Dask dashboard, which can give you much more information about what is going on.在这种情况下,我们建议您查看 Dask 仪表板,它可以为您提供有关正在发生的事情的更多信息。 Hopefully that can help you identify your issue.
希望这可以帮助您确定您的问题。
https://docs.dask.org/en/latest/diagnostics-distributed.htmlhttps://docs.dask.org/en/latest/diagnostics-distributed.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.