简体   繁体   English

如何从 python 的文件夹中读取 multiple.mat 文件?

[英]How to read multiple .mat file from a folder in python?

I am trying to read multiple.mat files in python. Every time I get the error.我正在尝试读取 python 中的 multiple.mat 文件。每次都出现错误。 This is my code:这是我的代码:

folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
directs = sorted(listdir(folder))
labels = []
for file in directs:
    f = h5py.File(folder+file,'r')
    label = np.array(f.get("cjdata/label"))[0][0]
    labels.append(label)
labels = pd.Series(labels)
labels.shape

The error I am getting is:我得到的错误是:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-e7d73f54f73d> in <module>
      3 labels = []
      4 for file in directs:
----> 5     f = h5py.File(folder+file,'r')
      6     label = np.array(f.get("cjdata/label"))[0][0]
      7     labels.append(label)

~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
    404             with phil:
    405                 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
--> 406                 fid = make_fid(name, mode, userblock_size,
    407                                fapl, fcpl=make_fcpl(track_order=track_order),
    408                                swmr=swmr)

~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    171         if swmr and swmr_support:
    172             flags |= h5f.ACC_SWMR_READ
--> 173         fid = h5f.open(name, flags, fapl=fapl)
    174     elif mode == 'r+':
    175         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (file signature not found)

I have 5849 mat files.我有 5849 个 mat 文件。 Can anyone tell me where I am going wrong?谁能告诉我哪里出错了?

I used h5py to read mat files.我使用 h5py 来读取 mat 文件。 I wanted to read the labels and images in each.mat files.我想阅读每个 .mat 文件中的标签和图像。

I believe the issue is in concatenating folder+file .我认为问题在于连接folder+file 2 things about that:关于那两件事:

  1. The word file is a python keyword, so you shouldn't use it as a variable name. word file是一个 python 关键字,所以你不应该将它用作变量名。
  2. Assuming you used os.listdir here (you didn't attach the import itself), your concatenation of folder and file is missing a slash.假设您在这里使用了os.listdir (您没有附加导入本身),您的文件夹和文件的串联缺少斜杠。 在此处输入图像描述

A fix for that (after I renamed file to filename ):对此的修复(在我将file重命名为filename之后):

full_file_path = os.path.join(folder, filename)
f = h5py.File(full_file_path,'r')

I here are 4 areas where the code could be improved:我这里有 4 个可以改进代码的地方:

  1. I prefer glob.iglob() method to get a list of files.我更喜欢glob.iglob()方法来获取文件列表。 It can use a wildcard to define the filenames, and is a generator.它可以使用通配符来定义文件名,并且是一个生成器。 That way you don't have to create a list with 5849 mat filenames.这样您就不必创建包含 5849 个 mat 文件名的列表。
  2. You open the file with h5py.File() , but don't close it.您使用h5py.File()打开文件,但不要关闭它。 That probably won't cause a problem, but is bad practice.这可能不会引起问题,但这是不好的做法。 It's better to use Python's with/as: context manager.最好使用 Python 的with/as:上下文管理器。 (If you don't do that, add f.close() inside the loop). (如果您不这样做,请在循环内添加f.close() )。
  3. You are using the dataset .get() method to retrieve the dataset object. That method has been deprecated for quite some time.您正在使用数据集.get()方法来检索数据集 object。该方法已被弃用很长一段时间了。 Documented practice is to reference the dataset name like this f["cjdata/label"]记录的做法是像这样引用数据集名称f["cjdata/label"]
  4. Also, you added [0][0] after the dataset object. Are you sure you want to do that?此外,您在数据集 object 之后添加了[0][0] 。您确定要这样做吗? They are indices that will access the dataset value at index= [0][0] .它们是将访问 index= [0][0]处的数据集值的索引。 If you want to create a numpy array of the dataset values, use label = f["cjdata/label"][()]如果要创建数据集值的 numpy 数组,请使用 label = f["cjdata/label"][()]

Modified code that demonstrates all of these changes below:修改后的代码演示了以下所有这些更改:

folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
file_wc = folder + "*.mat"  # assumes filename extension is .mat
labels = []
for fname in glob.iglob(file_wc):
    with h5py.File(fname,'r') as f:
        # dataset .get() method deprecated, line below updated appropriately:
        label = np.array(f["cjdata/label"][0][0])
        #or maybe just:
        label = f["cjdata/label"][()]
        labels.append(label)
labels = pd.Series(labels)
labels.shape

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM