[英]How do I add values in parallel to an existing HDF5 file with 3 groups and 12 datasets in each group using h5py?
I have installed the libraries using this link .我已经使用此链接安装了库。 I have already created an
HDF5
file called test.h5
using mpiexec -n 1 python3 test.py
.我已经使用
mpiexec -n 1 python3 test.py
创建了一个名为test.h5
的HDF5
文件。 test.py
is as below and I'm not sure if it is necessary to use mpi4py
here, please let me know. test.py
如下,我不确定这里是否有必要使用mpi4py
,请告诉我。
from mpi4py import MPI
import h5py
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
f = h5py.File('test.h5', 'w', driver='mpio', comm=comm)
f.create_group('t1')
f.create_group('t2')
f.create_group('t3')
for i in range(12):
f['t1'].create_dataset('test{0}'.format(i), (1,), dtype='f', compression='gzip')
f['t2'].create_dataset('test{0}'.format(i), (1,), dtype='i', compression='gzip')
f['t3'].create_dataset('test{0}'.format(i), (1,), dtype='i', compression='gzip')
f.close()
Now, I would like to write a test1.py
file that will:现在,我想编写一个
test1.py
文件,它将:
test.h5
and get all the unique keys (they are the same for all three groups).test.h5
并获取所有唯一键(所有三个组都相同)。chunks = [['test0','test1','test2'],['test3','test4','test5'],['test6','test7','test8'],['test9','test10','test11']]
.chunks = [['test0','test1','test2'],['test3','test4','test5'],['test6','test7','test8'],['test9','test10','test11']]
。 I don't care about the the order or groupings of these chunks but I would like one chunk per process.def write_h5(f, rank, chunks):
for key in chunks[rank]:
f['t1'][key][:] += 0.5
f['t2'][key][:] += 1
f['t3'][key][:] += 1
How do I do this?我该怎么做呢? Can you please explain in detail?
你能详细解释一下吗? Thanks a lot in advance!
非常感谢!
test1.py should contain: test1.py 应该包含:
from mpi4py import MPI
import h5py
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
def chunk_seq(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
def write_h5(f, chunk):
for key in chunk:
f['t1'][key][:] += 0.5
f['t2'][key][:] += 1
f['t3'][key][:] += 1
f = h5py.File('test.h5', 'a', driver='mpio', comm=comm)
chunks = chunk_seq(list(f['t1'].keys()), size)
write_h5(f, chunks[rank])
f.close()
Run it using: mpiexec -n 4 python3 test1.py
.使用
mpiexec -n 4 python3 test1.py
运行它。 The problem is that this will only work if you don't set compression='gzip'
while creating the datasets.问题是,只有在创建数据集时没有设置
compression='gzip'
时,这才会起作用。 For reference check the question Does HDF5 support compression with parallel HDF5?作为参考,检查问题Does HDF5 support compression with parallel HDF5? If not, why?
如果不是,为什么? but I'm not sure if this holds true for the latest version.
但我不确定这是否适用于最新版本。 Looking at this it seems like you'll have to read each dataset serially and create a corresponding dataset in a new
HDF5
file with compression.看看这个,您似乎必须连续读取每个数据集并在压缩的新
HDF5
文件中创建相应的数据集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.