[英]How to manage large h5 files
I have been encountering this issue for a while.我遇到这个问题已经有一段时间了。 I have a h5 file with a dataset "ds", which consists of a matrix of size 170000x70000, float 16. Evidently, I cannot load all of this at once, but I do not need to.
我有一个带有数据集“ds”的 h5 文件,它由一个大小为 170000x70000、浮点数为 16 的矩阵组成。显然,我无法一次加载所有这些,但我不需要。 I only need to work with vectors of 170000x1 and filter out the elements that fulfill a condition.
我只需要处理 170000x1 的向量并过滤掉满足条件的元素。 However, I am finding this to be very inefficient.
但是,我发现这是非常低效的。 I have tried to load it in chunks, like the example below, but this operation takes several minutes.
我尝试将其分块加载,如下例所示,但此操作需要几分钟时间。
f=h5py.File(myfile, "r")
ds = f["ds"]
i=1000
for i in tqdm(range(0,170000,100)):
chunk=ds[i:i+100,j]
filtered_chunk =np.where(chunk>0)
Has any of you encountered this issue before?你们中有人遇到过这个问题吗? How can I tackle this?
我该如何解决这个问题?
This is how you read vectors of 170000x1.这就是读取 170000x1 向量的方式。 I think addresses your question
我想解决了你的问题
f=h5py.File(myfile, "r")
ds = f["ds"]
for cnt in range(ds.shape[1]):
# get arr as a numpy array of shape (170000,1)
arr=ds[:,cnt]
...do something with arr
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.