简体   繁体   English

如何获取有关特定Dask任务的信息

[英]How to get information about a particular Dask task

I'm running into a problem whereby my distributed cluster appears to "hang" - eg tasks stop processing and hence a backlog of unprocessed tasks builds up so I'm looking for some way to help debug what's going on. 我遇到了一个问题,我的分布式集群似乎“挂起” - 例如任务停止处理,因此积累了未处理的任务,所以我正在寻找一些方法来帮助调试正在发生的事情。

On the Client there's the processing method which will tell me what tasks are currently running on each worker but AFAICS that's the only info about the tasks available on the Client object? Clientprocessing方法,它会告诉我当前在每个工作程序上运行的任务,但是AFAICS是关于Client对象上可用任务的唯一信息吗?

What I'd like to to is to be able to query not just processing tasks, but all tasks including processed, processing and errored and for each task to be able to get some statistics such as submitted_time and completion_time which would allow me to find out what tasks are blocking the cluster. 我想要的是能够不仅查询处理任务,而且查询所有任务,包括已处理,处理和错误,并且每个任务都能够获得一些统计信息,例如submitted_timecompletion_time ,这些都可以让我找到哪些任务阻塞了群集。

This would be similar to the extended metadata on the ipyparallel.AsyncResult 这与ipyparallel.AsyncResult上的扩展元数据类似

A nice to have would be to to be able to get the args/kwargs for any give task too. 一个很好的方法是能够获得任何给定任务的args/kwargs This would be especially helpful in debugging failed tasks. 这在调试失败的任务时尤其有用。

Is any of this functionality available currently or is there any way to get the info I'm after? 目前是否有任何此功能可用,或者有任何方法可以获取我之后的信息吗?

Any other suggestions on how to debug the problem would be greatly welcomed. 关于如何调试问题的任何其他建议都将受到欢迎。

As of May 2017 no explicit "give me all of the information about a task" operation exists. 截至2017年5月,没有明确的“给我关于任务的所有信息”操作存在。 However, you can use the client to investigate task state directly. 但是,您可以使用客户端直接调查任务状态。 This will require you to dive a bit into the information that the scheduler and worker track. 这将要求您深入了解调度程序和工作程序跟踪的信息。 See the following doc pages: 请参阅以下文档页面:

To query this state I would use the Client.run_on_scheduler and Client.run methods. 要查询此状态,我将使用Client.run_on_schedulerClient.run方法。 These take a function to run on the scheduler or workers respsectively. 这些功能可以在调度程序或工作人员身上运行。 If this function includes a dask_scheduler or dask_worker argument then the function will be given the scheduler or worker object itself. 如果此函数包含dask_schedulerdask_worker参数,则该函数将被赋予调度程序或工作程序对象本身。

def f(dask_scheduler):
    return dask_scheduler.task_state

client.run_on_scheduler(f)

You now have access to check any state that the scheduler or workers know about and to run any internal diagnostic checks. 您现在可以检查调度程序或工作人员所知的任何状态,并运行任何内部诊断检查。 What you choose to investigate though depends entirely on your use case. 您选择调查的内容完全取决于您的用例。

def f(keys, dask_scheduler=None):
    return dask_scheduler.transition_story(*keys)

client.run_on_scheduler(f, [key1, key2, key3])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM