简体   繁体   English

我们可以为 python 文件类对象禁用 h5py 文件锁定吗?

[英]Can we disable h5py file locking for python file-like object?

When opening an HDF5 file with h5py you can pass in a python file-like object.使用h5py打开 HDF5 文件时,您可以传入一个类似 python 文件的对象。 I have done so, where the file-like object is a custom implementation of my own network-based transport layer.我已经这样做了,其中类似文件的对象是我自己的基于网络的传输层的自定义实现。

This works great, I can slice large HDF5 files over a high latency transport layer.这很好用,我可以在高延迟传输层上对大型 HDF5 文件进行切片。 However HDF5 appears to provide its own file locking functionality, so that if you open multiple files for read-only within the same process (threading model) it will still only run the operations, effectively, in series.然而,HDF5 似乎提供了它自己的文件锁定功能,因此如果您在同一进程(线程模型)中打开多个只读文件,它仍然只会有效地连续运行操作。

There are drivers in HDF5 that support parallel operations, such as h5py.File(f, driver='mpio') , but this doesn't appear to apply to python file-like objects which use h5py.File(f, driver='fileobj') . HDF5 中有支持并行操作的驱动程序,例如h5py.File(f, driver='mpio') ,但这似乎不适用于使用h5py.File(f, driver='fileobj')的类似 python 文件的对象h5py.File(f, driver='fileobj')

The only solution I see is to use multiprocessing.我看到的唯一解决方案是使用多处理。 However the scalability is very limited, you can only realistically open 10's of processes because of overhead.但是可扩展性非常有限,由于开销,您实际上只能打开 10 个进程。 My transport layer uses asyncio and is capable of parallel operations on the scale of 1,000's or 10,000's, allowing me to build a longer queue of slow file-read operations which boost my total throughput.我的传输层使用 asyncio 并且能够以 1,000 或 10,000 的规模进行并行操作,这使我能够构建更长的慢速文件读取操作队列,从而提高我的总吞吐量。

I can achieve 1.5 GB/sec of large-file, random-seek, binary reads with my transport layer against a local S3 interface when I queue 10k IO ops in parallel (requiring 50GB of RAM to service the requests, an acceptable trade-off for the throughput).当我并行排队 10k IO 操作(需要 50GB 的 RAM 来服务请求,一个可接受的权衡吞吐量)。

Is there any way I can disable the h5py file locking when using driver='fileobj' ?有什么办法可以在使用driver='fileobj'时禁用 h5py 文件锁定?

You just need to set the value to FALSE for the environment variable HDF5_USE_FILE_LOCKING .您只需要将环境变量HDF5_USE_FILE_LOCKING的值设置为FALSE

Examples are as follows:示例如下:

In Linux or MacOS via Terminal: export HDF5_USE_FILE_LOCKING=FALSE在 Linux 或 MacOS 中通过终端: export HDF5_USE_FILE_LOCKING=FALSE

In Windows via Command Prompts (CMD): set HDF5_USE_FILE_LOCKING=FALSE在 Windows 中通过命令提示符 (CMD): set HDF5_USE_FILE_LOCKING=FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM