简体   繁体   中英

Python : Dot product of dask array

I am trying to do dot product of very large 2 dask arrays X (35000 x 7500) and Y(7500 x 10). As the dot product will also be very large I am storing it in hdf5

f = h5py.File('output.hdf5')
f['output'] = X.dot(Y)

But the second command is not giving any output even though its almost 1 hour. What is wrong? Is there faster technique ? Is there issue of "chunks" while creating X and Y?

Consider the .to_hdf5 method or da.store function.

>>> X.dot(Y).to_hdf5('output.hdf5', 'output')

or

>>> output = f.create_dataset('/output', X.dot(Y).shape, X.dot(Y).dtype)
>>> da.store(X.dot(Y), output)

The to_hdf5 method is probably easier for you. The da.store method is general to other formats as well.

The __setitem__ function in H5Py (what you're using when you say f['output'] = ... is hardcoded to use NumPy arrays.

Here is the appropriate section in the documentation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM