简体   繁体   English

从压缩文件夹中的文件夹中读取 H5File 到 pandas dataframe

[英]Read H5File from a folder inside a zipped folder into pandas dataframe

Question: Read H5File from a folder inside a zipped folder into pandas dataframe问题:从压缩文件夹中的文件夹中读取 H5File 到 pandas dataframe

Background: The directory structure I have looks like this:背景:我的目录结构如下所示:
file.zip/2019/file.h5文件.zip/2019/file.h5

file.zip is the zipped folder file.zip 是压缩文件夹
2019 is the folder inside the zipped folder 2019 是压缩文件夹内的文件夹

I can extract the folder using extractall and read the h5 file from the folder.我可以使用 extractall 提取文件夹并从文件夹中读取 h5 文件。 However, looking to read it directly from the zipped folder to pandas dataframe.但是,希望直接从压缩文件夹中读取它到 pandas dataframe。

Code to create a sample file:创建示例文件的代码:
Here is the code to recreate a sample h5 file that I am trying to use in this scenario:这是重新创建我试图在这种情况下使用的示例 h5 文件的代码:

Step 1:步骤1:

import h5py
file = h5py.File('sample.h5','w')
dataset = file.create_dataset("dset",(4, 6), h5py.h5t.STD_I32BE)
file.close()

Step 2:第2步:
After the file is created, put it in a folder "2019".创建文件后,将其放入文件夹“2019”中。 Place "2019" inside another folder called zipfolder and zip it.将“2019”放在另一个名为 zipfolder 和 zip 的文件夹中。 So now the directory structure looks like "file.zip/2019/file.h5"所以现在目录结构看起来像“file.zip/2019/file.h5”

Note: This is an H5py file and HDFStore.注意:这是一个 H5py 文件和 HDFStore。 Pandas read_hdf cannot work on H5Files. Pandas read_hdf 不能在 H5Files 上工作。 Read on HDF5 documentation for more clarity on H5 Files and HDFStore.阅读 HDF5 文档以更清楚地了解 H5 文件和 HDFStore。 They both have different internal structure however the same.h5 extension.For reading H5 Files, h5py package is used.它们都有不同的内部结构但是相同的.h5 扩展名。对于读取 H5 文件,使用 h5py package。

import os
import pandas as pd
import zipfile

with zipfile.ZipFile('file.zip') as z:
    for filename in z.namelist():
        if os.path.isdir(filename) and filename == "2019":
            # read the file into a pandas dataframe
            df = pd.read_hdf(z.open(os.path.join(filename, "file.h5"), 'rb'))

Hope it will help you!希望对您有所帮助!

Figured this out with the help of H5py google group: https://groups.google.com/forum/m/#!forum/h5py在 H5py 谷歌组的帮助下解决了这个问题: https://groups.google.com/forum/m/#!forum/h5py

import zipfile import h5py import pandas as pd
print(h5py.__version__)# Make sure the version is 2.9 or above zf = zipfile.ZipFile('zipfolder.zip') print(zf.namelist())# get the name of the fileobject
fiz = zf.open('zipfolder/2019/sample.h5')
hf = h5py.File(fiz,'r')
print(list(hf.keys())) # To see the datasets inside h5 File
df = pd.DataFrame(hf['dset'][:]) df.head()

Used h5py to read h5Files.使用 h5py 读取 h5Files。 Pandas reads only the HDFStore formats that have structured dataframe formats and doesn't read h5files as of now. Pandas 仅读取具有结构化 dataframe 格式的 HDFStore 格式,并且目前不读取 h5files。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM