简体   繁体   中英

Dask dictionary to delayed object adapter

I've been searching around but have not found a solution. I've been working in Dask dictionary but the team is working in delayed object. I need to convert my dsk{} to the last step delayed object.

What I do now:

def add(x, y):
    return x+y

dsk = {
      'step1' : (add, 1, 2),
      'step2' : (add, 'step1', 3),
      'final' : (add, 'step2', 'step1'),
}

dask.visualize(dsk)
client.get(dsk, 'final')

In this way of working, all my functions are normal python functions. However, this is different than our team.

What the team is doing:

@dask.delayed
def add(x, y)
    return x+y

step1 = add(1, 2)
step2 = add(step1, 3)
final = add(step2, step1)

final.visualize()
client.submit(final)

Then they are going to further schedule the work using the final step delayed object. How to convert the dsk last step final to the delayed object?

My current thinking (not working yet)

from dask.optimization import cull

outputs = ['final']
dsk1, dependencies = cull(dsk, outputs)  # remove unnecessary tasks from the graph

After that, I'm not sure how to construct a delayed object.

Thank you!

Finally, I found a workaround. The idea is to iterate through the dsk to create delayed objects and dependencies.

# Covnert dsk dictionary to dask.delayed objects
for dsk_name, dsk_values in dsk.items():
    args = []
    dsk_function = dsk_values[0]
    dsk_arguments = dsk_values[1:]
    for arg in dsk_arguments:
        if isinstance(arg, str):
            # try to find the arguments in globals and return dependent dask object
            args.append( globals().get(arg, arg) )
        else:
            args.append(arg)
    globals()[dsk_name] = dask.delayed(dsk_function)(*args)

We generally recommend that people use Dask delayed. It is less error prone. Today, dictionaries are usually used mostly be people working on Dask itself. That said, if you want to convert a dictionary into a delayed object I recommend looking at the dask.Delayed object.

In [1]: from dask.delayed import Delayed                                                                                             

In [2]: Delayed?                                                                                                                     
Init signature: Delayed(key, dsk, length=None)
Docstring:     
Represents a value to be computed by dask.

Equivalent to the output from a single key in a dask graph.
File:           ~/workspace/dask/dask/delayed.py
Type:           type
Subclasses:     DelayedLeaf, DelayedAttr

So in your case you want

value = Delayed("final", dsk)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM