简体   繁体   English

如何管理大型h5文件

[英]How to manage large h5 files

I have been encountering this issue for a while.我遇到这个问题已经有一段时间了。 I have a h5 file with a dataset "ds", which consists of a matrix of size 170000x70000, float 16. Evidently, I cannot load all of this at once, but I do not need to.我有一个带有数据集“ds”的 h5 文件,它由一个大小为 170000x70000、浮点数为 16 的矩阵组成。显然,我无法一次加载所有这些,但我不需要。 I only need to work with vectors of 170000x1 and filter out the elements that fulfill a condition.我只需要处理 170000x1 的向量并过滤掉满足条件的元素。 However, I am finding this to be very inefficient.但是,我发现这是非常低效的。 I have tried to load it in chunks, like the example below, but this operation takes several minutes.我尝试将其分块加载,如下例所示,但此操作需要几分钟时间。

f=h5py.File(myfile, "r")
ds = f["ds"]
i=1000
for i in tqdm(range(0,170000,100)):
    chunk=ds[i:i+100,j]
    filtered_chunk =np.where(chunk>0)

Has any of you encountered this issue before?你们中有人遇到过这个问题吗? How can I tackle this?我该如何解决这个问题?

This is how you read vectors of 170000x1.这就是读取 170000x1 向量的方式。 I think addresses your question我想解决了你的问题

f=h5py.File(myfile, "r")
ds = f["ds"]
for cnt in range(ds.shape[1]):
    # get arr as a numpy array of shape (170000,1)
    arr=ds[:,cnt]  
    ...do something with arr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM