Python : Dot product of dask array

Question

I am trying to do dot product of very large 2 dask arrays X (35000 x 7500) and Y(7500 x 10). As the dot product will also be very large I am storing it in hdf5

f = h5py.File('output.hdf5')
f['output'] = X.dot(Y)

But the second command is not giving any output even though its almost 1 hour. What is wrong? Is there faster technique ? Is there issue of "chunks" while creating X and Y?

Answer 1

Consider the .to_hdf5 method or da.store function.

>>> X.dot(Y).to_hdf5('output.hdf5', 'output')

or

>>> output = f.create_dataset('/output', X.dot(Y).shape, X.dot(Y).dtype)
>>> da.store(X.dot(Y), output)

The to_hdf5 method is probably easier for you. The da.store method is general to other formats as well.

The __setitem__ function in H5Py (what you're using when you say f['output'] = ... is hardcoded to use NumPy arrays.

Here is the appropriate section in the documentation.

Python : Dot product of dask array

Question

1 answers

solution1
1 ACCPTED 2016-03-25 14:21:40

Python : Dot product of dask array

Question

1 answers

solution1 1 ACCPTED 2016-03-25 14:21:40

solution1
1 ACCPTED 2016-03-25 14:21:40