简体   繁体   English

从h5文件组初始化或填充多个numpy数组

[英]Initializing or populating multiple numpy arrays from h5 file groups

I have an h5 file with 5 groups, each group containing a 3D dataset. 我有一个5组的h5文件,每个组包含一个3D数据集。 I am looking to build a for loop that allows me to extract each group into a numpy array and assign the numpy array to an object with the group header name. 我正在寻找建立一个for循环,使我可以将每个组提取到一个numpy数组中,并将numpy数组分配给具有组头名称的对象。 I am able to get a number of different methods to work with one group, but when I try to build a for loop that applies to code to all 5 groups, it breaks. 我可以使用多种不同的方法来处理一组,但是当我尝试构建一个适用于所有5组代码的for循环时,它就会中断。 For example: 例如:

import h5py as h5
import numpy as np

f = h5.File("FFM0012.h5", "r+") #read in h5 file
print(list(f.keys())) #['FFM', 'Image'] for my dataset
FFM = f['FFM'] #Generate object with all 5 groups
print(list(FFM.keys())) #['Amp', 'Drive', 'Phase', 'Raw', 'Zsnsr'] for my dataset

Amp = FFM['Amp'] #Generate object for 1 group
Amp = np.array(Amp) #Turn into numpy array, this works.

Now when I try to apply the same logic with a for loop: 现在,当我尝试通过for循环应用相同的逻辑时:

h5_keys = [] 
FFM.visit(h5_keys.append) #Create list of group names ['Amp', 'Drive', 'Phase', 'Raw', 'Zsnsr']

for h5_key in h5_keys:
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)

print(Amp[30,30,30]) #To check that array is populated

When I run this code I get "NameError: name 'Amp' is not defined". 当我运行此代码时,我得到“ NameError:未定义名称'Amp'”。 I've tried initializing the numpy array before the for loop with: 我试过在for循环之前使用以下命令初始化numpy数组:

h5_keys = [] 
FFM.visit(h5_keys.append) #Create list of group names

Amp = np.array([])
for h5_key in h5_keys:
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)

print(Amp[30,30,30]) #To check that array is populated

This produces the error message "IndexError: too many indices for array" 这将产生错误消息“ IndexError:数组的索引过多”

I've also tried generating a dictionary and creating numpy arrays from the dictionary. 我也尝试过生成字典并从字典创建numpy数组。 That is a similar story where I can get the code to work for one h5 group, but it falls apart when I build the for loop. 这是一个类似的故事,在这里我可以使代码适用于一个h5组,但是在构建for循环时却分崩离析。

Any suggestions are appreciated! 任何建议表示赞赏!

You seem to have jumped to using h5py and numpy before learning much of Python 在学习大量Python之前,您似乎已经跳到使用h5pynumpy

Amp = np.array([])        # creates a numpy array with 0 elements
for h5_key in h5_keys:    # h5_key is set of a new value each iteration
    tmp = FFM[h5_key]
    h5_key = np.array(tmp)    # now you reassign h5_key

print(Amp[30,30,30])      # Amp is the original (0,) shape array

Try this basic python loop, paying attention to the value of i : 试试这个基本的python循环,注意i的值:

alist = [1,2,3]
for i in alist:
    print(i)
    i = 10
    print(i)
print(alist)       # no change to alist

f is the file. f是文件。

FFM = f['FFM'] 

is a group 是一个group

Amp = FFM['Amp']

is a dataset. 是一个数据集。 There are various ways of load the dataset into an numpy array. 有多种方法可以将数据集加载到numpy数组中。 I believe the [...] slicing is the current preferred one. 我相信, [...]切片是当前的首选之一。 .value used to used but is now deprecated ( loading dataset ) .value曾经使用,但现在已弃用加载数据集

Amp = FFM['Amp'][...]

is an array. 是一个数组。

alist = [FFM[key][...] for key in h5_keys]

should create a list of arrays from the FFM group. 应该从FFM组创建一个数组列表。

If the shapes are compatible, you can concatenate the arrays into one array: 如果形状兼容,则可以将阵列连接成一个阵列:

np.array(alist)
np.stack(alist)
np.concatatenate(alist, axis=0)   # or other axis

etc 等等

adict = {key: FFM[key][...] for key in h5_keys}

should crate of dictionary of array keyed by dataset names. 应该创建由数据集名称作为关键字的数组字典的板条箱。

In Python, lists and dictionaries are the ways of accumulating objects. 在Python中,列表和字典是累积对象的方式。 The h5py groups behave much like dictionaries. h5py组的行为很像字典。 Datasets behave much like numpy arrays, though they remain on the disk until loaded with [...] . 数据集的行为非常类似于numpy数组,尽管它们保留在磁盘上,直到加载[...]为止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM