[英]What is the most efficient way to find all paths to particular values in HDF5 file with Python?
I am looking for negative values in a .hdf5 file that has the following architecture:我正在寻找具有以下架构的 .hdf5 文件中的负值:
- Incidence_0
- Wavelength_0
- (Table of size m * n)
- Wavelength_1
- (Table of size m * n)
...
- Incidence_1
...
...
My objective is to find every negative value, and to get back its exact position in the file (ie, the number of the incidence, the number of the wavelength, and its position in the associated table).我的目标是找到每个负值,并取回它在文件中的确切位置(即入射数、波长数及其在相关表中的位置)。
I am sorry that I cannot give and minimal reproducible example because I cannot give the file that I'm using, but here is the idea.很抱歉,我不能给出最小可重复的例子,因为我不能给出我正在使用的文件,但这是我的想法。
import h5py
file = h5py.File('testFile.hdf5', 'r')
result = []
for incidence in range(nbIncidence):
for wavelength in range(nbWavelength):
for theta in range(nbTheta):
for phi in range(nbPhi):
value = file['Incidence_' + str(incidence)]['Wavelength_' + str(wavelength)][theta, phi]
if (value < 0):
result.append([value, incidence, wavelength, theta, phi])
This is perfectly working, but using four loops is time-consuming, especially if I have to work on huge files, that may probably happen... I don't know enough the h5py library but I am pretty sure that it exists a way to do this way faster than that.这是完美的工作,但使用四个循环非常耗时,特别是如果我必须处理大文件,这可能会发生......我对 h5py 库了解不够,但我很确定它存在一种方式这样做比那更快。
First, the bad news: h5py
doesn't have a function to interrogate your data in the way you described.首先,坏消息:
h5py
没有以您描述的方式询问您的数据的功能。 The good news: you can accomplish your task by extracting each Incident/Wavelength dataset to an NumPy array, then combining 2 NumPy methods to operate on the extracted array.好消息:您可以通过将每个事件/波长数据集提取到 NumPy 数组中来完成您的任务,然后结合 2 个 NumPy 方法对提取的数组进行操作。 [Note: This assumes you have sufficient memory to load each dataset.]
[注意:这假设您有足够的内存来加载每个数据集。]
Some observations on working with this data (to help you follow my example).关于处理这些数据的一些观察(以帮助您遵循我的示例)。
.keys()
method..keys()
方法获取组和数据集名称。 (Or, you can get (name, object)
tuples with the .items()
method.) .items()
方法获取(name, object)
元组。)arr = file['Incidence_#']['Wavelength_#'][()]
arr = file['Incidence_#']['Wavelength_#'][()]
arr < 0
will return True
for all negative values (and False
for other values).arr < 0
将对所有负值返回True
(对其他值返回False
)。np.argwhere
to find array element indices that are non-zero.np.argwhere
查找非零的数组元素索引。 Use this on the boolean array (remembering True is non-zero, and False is zero). I created a simple example that mimics your schema to demonstrate the process.我创建了一个模仿您的模式的简单示例来演示该过程。 For completeness, that code is at the end.
为了完整起见,该代码位于最后。 It's a small file that won't bury you in output.
这是一个小文件,不会把你埋没在输出中。
Code below reads the data, finds negative values, and adds data to a list.下面的代码读取数据,找到负值,并将数据添加到列表中。 It has several print statements so you can see how each step works.
它有几个打印语句,所以你可以看到每个步骤是如何工作的。 They aren't need once you are confident in the procedure.
一旦您对程序有信心,就不需要它们。
with h5py.File('testFile.hdf5', 'r') as h5fr:
result = []
for i_grp in h5fr.keys():
for wave_ds in h5fr[i_grp].keys():
wave_arr = h5fr[i_grp][wave_ds][()]
neg_idx = np.argwhere(wave_arr < 0.0)
wave_res = []
for n in neg_idx:
i, j = n[0], n[1]
result.append([wave_arr[i,j], i_grp, wave_ds, i, j])
wave_res.append([wave_arr[i,j], i_grp, wave_ds, i, j])
print(f'\nResults for {i_grp}; {wave_ds}:')
print(wave_res)
Code to create the example file used above:创建上面使用的示例文件的代码:
nbIncidence = 4
nbWavelength = 6
m, n = 10, 10 # m,n same as nbTheta, nbPhi??
with h5py.File('testFile.hdf5', 'w') as h5fw:
for i_cnt in range(nbIncidence):
grp = h5fw.create_group('Incidence_' + str(i_cnt))
for w_cnt in range(nbWavelength):
arr = np.random.uniform(low=-1.0, high=10.0, size=(m,n)) #.reshape(m,n)
grp.create_dataset('Wavelength_' + str(w_cnt), data=arr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.