简体   繁体   中英

Python: Read compressed (.gz) HDF file without writing and saving uncompressed file

I have a large number of compressed HDF files, which I need to read.

file1.HDF.gz
file2.HDF.gz
file3.HDF.gz
...

I can read in uncompressed HDF files with the following method

from pyhdf.SD import SD, SDC
import os

os.system('gunzip < file1.HDF.gz >  file1.HDF')
HDF = SD('file1.HDF')

and repeat this for each file. However, this is more time consuming than I want.

I'm thinking its possible that most of the time overhang comes from writing the compressed file to a new uncompressed version, and that I could speed it up if I simply was able to read an uncompressed version of the file into the SD function in one step.

Am I correct in this thinking? And if so, is there a way to do what I want?

According to the pyhdf package documentation , this is not possible.

__init__(self, path, mode=1)
  SD constructor. Initialize an SD interface on an HDF file,
  creating the file if necessary.

There is no other way to instantiate an SD object that takes a file-like object. This is likely because they are conforming to an external interface (NCSA HDF). The HDF format also normally handles massive files that are impractical to store in memory at one time.

Unzipping it as a file is likely your most performant option.

If you would like stay in Python, use the gzip module (docs) :

import gzip
import shutil
with gzip.open('file1.HDF.gz', 'wb') as f_in, open('file1.HDF', 'rb') as f_out:
    shutil.copyfileobj(f_in, f_out)

sascha是正确的,因为hdf透明压缩比gzip压缩更合适,但是,如果您无法控制hdf文件的存储方式,则您正在寻找gzip python模块(文档),它可以从这些文件中获取数据。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM