简体   繁体   中英

h5py open file blocks with MPI

I'm trying to open a hdf5 file using h5py with mpi by executing

print("Opening...")
f = h5py.File(file_path, "r", driver='mpio', comm=MPI.COMM_WORLD)
print("Done")

For some reason, this line blocks when executed in my project. I tried to create a small reproducible example without success as this line works as it should in these examples.

So there is something in my codebase that I can't track down, that causes the above-mentioned line to block.

Question: What can cause h5py.File to block?

Note: CPU goes to 100% so mpi seems to be waiting for something...


Note2: Added some code from my codebase that doesn't help at all:

Opening the file before the if works, inside the if just blocks...

from mpi4py import MPI
import h5py
from DataProviderH5PYPool import init_pool, new_worker
import Settings


rank = MPI.COMM_WORLD.Get_rank()
task = [
    "main",
    "h5py_worker"
]

task = task[rank] if rank < len(task)-1 else task[-1]
print("Starting new process:  {} with rank {}".format(task,rank))

def init():
    # works
    print(h5py.File(Settings.h5py.training[0], "r", driver='mpio', comm=MPI.COMM_WORLD)["0"][0])
    if task == "main":
        # blocks
        # print(h5py.File(Settings.h5py.training[0], "r", driver='mpio', comm=MPI.COMM_WORLD)["0"][0])

        init_pool(n=MPI.COMM_WORLD.Get_size()-1)
        return True
    elif task == "h5py_worker":
        # works too but results in 
        # RuntimeError: Can't decrement id ref count (Can't close file, there are objects still open
        # print(h5py.File(Settings.h5py.training[0], "r", driver='mpio', comm=MPI.COMM_WORLD)["0"][0])

        new_worker()
        return False
    else:
        raise RuntimeError("Unsupported task '{}'".format(task))

Code is executed via

mpiexec -n 2 python Test.py
or
mpiexec.mpich -n 2 python Test.py

installed both and tried them but got the same result...

My guess would be that the open call is collective so needs to be called by all processes in the communicator, and since you specify COMM_WORLD this means everyone. If you only call on a subset then it will block.

I'm not 100% clear what you want to do - do you only want to open the file on the main task? This would mean that only the main task could actually write to the file so the workers would need to use MPI to send any data that they wanted to be written.

If so, you could call the open only on main but use the communicator COMM_SELF which would mean that it would not wait for all the other ranks to call the open as well.

Regards,

David

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM