I am trying to do dot product of very large 2 dask arrays X (35000 x 7500) and Y(7500 x 10). As the dot product will also be very large I am storing it in hdf5
f = h5py.File('output.hdf5')
f['output'] = X.dot(Y)
But the second command is not giving any output even though its almost 1 hour. What is wrong? Is there faster technique ? Is there issue of "chunks" while creating X and Y?
Consider the .to_hdf5
method or da.store
function.
>>> X.dot(Y).to_hdf5('output.hdf5', 'output')
or
>>> output = f.create_dataset('/output', X.dot(Y).shape, X.dot(Y).dtype)
>>> da.store(X.dot(Y), output)
The to_hdf5
method is probably easier for you. The da.store
method is general to other formats as well.
The __setitem__
function in H5Py (what you're using when you say f['output'] = ...
is hardcoded to use NumPy arrays.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.