[英]How to read multiple .mat file from a folder in python?
I am trying to read multiple.mat files in python. Every time I get the error.我正在尝试读取 python 中的 multiple.mat 文件。每次都出现错误。 This is my code:这是我的代码:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
directs = sorted(listdir(folder))
labels = []
for file in directs:
f = h5py.File(folder+file,'r')
label = np.array(f.get("cjdata/label"))[0][0]
labels.append(label)
labels = pd.Series(labels)
labels.shape
The error I am getting is:我得到的错误是:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-11-e7d73f54f73d> in <module>
3 labels = []
4 for file in directs:
----> 5 f = h5py.File(folder+file,'r')
6 label = np.array(f.get("cjdata/label"))[0][0]
7 labels.append(label)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
404 with phil:
405 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
--> 406 fid = make_fid(name, mode, userblock_size,
407 fapl, fcpl=make_fcpl(track_order=track_order),
408 swmr=swmr)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r+':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5f.pyx in h5py.h5f.open()
OSError: Unable to open file (file signature not found)
I have 5849 mat files.我有 5849 个 mat 文件。 Can anyone tell me where I am going wrong?谁能告诉我哪里出错了?
I used h5py to read mat files.我使用 h5py 来读取 mat 文件。 I wanted to read the labels and images in each.mat files.我想阅读每个 .mat 文件中的标签和图像。
I believe the issue is in concatenating folder+file
.我认为问题在于连接folder+file
。 2 things about that:关于那两件事:
file
is a python keyword, so you shouldn't use it as a variable name. word file
是一个 python 关键字,所以你不应该将它用作变量名。os.listdir
here (you didn't attach the import itself), your concatenation of folder and file is missing a slash.假设您在这里使用了os.listdir
(您没有附加导入本身),您的文件夹和文件的串联缺少斜杠。 A fix for that (after I renamed file
to filename
):对此的修复(在我将file
重命名为filename
之后):
full_file_path = os.path.join(folder, filename)
f = h5py.File(full_file_path,'r')
I here are 4 areas where the code could be improved:我这里有 4 个可以改进代码的地方:
glob.iglob()
method to get a list of files.我更喜欢glob.iglob()
方法来获取文件列表。 It can use a wildcard to define the filenames, and is a generator.它可以使用通配符来定义文件名,并且是一个生成器。 That way you don't have to create a list with 5849 mat filenames.这样您就不必创建包含 5849 个 mat 文件名的列表。h5py.File()
, but don't close it.您使用h5py.File()
打开文件,但不要关闭它。 That probably won't cause a problem, but is bad practice.这可能不会引起问题,但这是不好的做法。 It's better to use Python's with/as:
context manager.最好使用 Python 的with/as:
上下文管理器。 (If you don't do that, add f.close()
inside the loop). (如果您不这样做,请在循环内添加f.close()
)。.get()
method to retrieve the dataset object. That method has been deprecated for quite some time.您正在使用数据集.get()
方法来检索数据集 object。该方法已被弃用很长一段时间了。 Documented practice is to reference the dataset name like this f["cjdata/label"]
记录的做法是像这样引用数据集名称f["cjdata/label"]
[0][0]
after the dataset object. Are you sure you want to do that?此外,您在数据集 object 之后添加了[0][0]
。您确定要这样做吗? They are indices that will access the dataset value at index= [0][0]
.它们是将访问 index= [0][0]
处的数据集值的索引。 If you want to create a numpy array of the dataset values, use label = f["cjdata/label"][()]如果要创建数据集值的 numpy 数组,请使用 label = f["cjdata/label"][()]Modified code that demonstrates all of these changes below:修改后的代码演示了以下所有这些更改:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
file_wc = folder + "*.mat" # assumes filename extension is .mat
labels = []
for fname in glob.iglob(file_wc):
with h5py.File(fname,'r') as f:
# dataset .get() method deprecated, line below updated appropriately:
label = np.array(f["cjdata/label"][0][0])
#or maybe just:
label = f["cjdata/label"][()]
labels.append(label)
labels = pd.Series(labels)
labels.shape
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.