简体   繁体   English

为什么h5py数据集并行赋值没有输出?

[英]Why is there no output in parallel assignment of h5py dataset?

I am trying to generate an hdf5 dataset from code that runs in parallel, but when I read the dataset generated, it is blank, all entries are zero.我试图从并行运行的代码生成 hdf5 数据集,但是当我读取生成的数据集时,它是空白的,所有条目都为零。

I have replaced the parallel code with a sequential for loop and the dataset works out fine in this case, but I don't know what is the problem with doing the same in parallel.我已经用顺序 for 循环替换了并行代码,并且在这种情况下数据集运行良好,但我不知道并行执行相同操作有什么问题。

Here is the code to a minimal example这是一个最小示例的代码

import h5py
import scipy.stats as st

file = h5py.File('test.hdf5','a')

dset = file.create_dataset('x', (10,1024), maxshape=(None,1024), 
                           dtype='float32')

def assign(j):
    dset[j,:] = st.norm.rvs(0.,1.,1024)

from joblib import Parallel, delayed
import multiprocessing as mp

Parallel(n_jobs=4)(delayed(assign)(j) for j in range(10))

file.close()

And the file is later read with然后文件被读取

import h5py

file = h5py.File('test.hdf5','r')
file['x'][:]

What is the issue with the code running in parallel?并行运行的代码有什么问题?

Every thread of your parallel code gets it's own copy of the dset and they keep stepping on each other's toes.并行代码的每个线程都有自己的dset副本,并且它们不断地相互dset You may try something like this to get it working:你可以尝试这样的事情来让它工作:

def get_row(x) :
    return st.norm.rvs(0.,1.,1024)

dset[:,:] = Parallel(n_jobs=4)(delayed(get_row)(j) for j in range(10))

ps.附: thanks to @gapollo for the correction!感谢@gapollo 的更正!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM