如何在python中读取gzip netcdf文件？

Question

I have a working python program that reads in a number of large netCDF files using the Dataset command from the netCDF4 module. 我有一个工作的python程序，它使用netCDF4模块中的Dataset命令读入大量的netCDF文件。 Here is a snippet of the relevant parts: 以下是相关部分的片段：

from netCDF4 import Dataset
import glob

infile_root = 'start_of_file_name_'

for infile in sorted(glob.iglob(infile_root + '*')):
   ncin = Dataset(infile,'r')
   ncin.close()

I want to modify this to read in netCDF files that are gzipped. 我想修改它以读取gzip压缩的netCDF文件。 The files themselves were gzipped after creation; 文件本身在创建后进行了压缩; they are not internally compressed (ie, the files are *.nc.gz). 它们不是内部压缩的（即文件是* .nc.gz）。 If I were reading in gzipped text files, the command would be: 如果我正在阅读gzip压缩文本文件，那么命令将是：

from netCDF4 import Dataset
import glob
import gzip

infile_root = 'start_of_file_name_'

for infile in sorted(glob.iglob(infile_root + '*.gz')):
   f = gzip.open(infile, 'rb')
   file_content = f.read()
   f.close()

After googling around for maybe half an hour and reading through the netCDF4 documentation, the only way I can come up with to do this for netCDF files is: 在谷歌搜索大约半小时并阅读netCDF4文档后，我能够为netCDF文件做到这一点的唯一方法是：

from netCDF4 import Dataset
import glob
import os

infile_root = 'start_of_file_name_'

for infile in sorted(glob.iglob(infile_root + '*.gz')):
   os.system('gzip -d ' + infile)
   ncin = Dataset(infile[:-3],'r')
   ncin.close()
   os.system('gzip ' + infile[:-3])

Is it possible to read gzip files with the Dataset command directly? 是否可以直接使用数据集命令读取gzip文件？ Or without otherwise calling gzip through os? 或者没有通过os调用gzip？

Answer 1

Because NetCDF4-Python wraps the C NetCDF4 library, you're out of luck as far as using the gzip module to pass in a file-like object. 因为NetCDF4-Python包装了C NetCDF4库，所以只要使用gzip模块传入类文件对象就不行了。 The only option is, as suggested by @tdelaney, to use the gzip to extract to a temporary file. 正如@tdelaney所建议的那样，唯一的选择是使用gzip提取到临时文件。

If you happen to have any control over the creation of these files, NetCDF version 4 files support zlib compression internally, so that using gzip is superfluous. 如果您碰巧对这些文件的创建有任何控制，NetCDF版本4文件在内部支持zlib压缩，因此使用gzip是多余的。 It might also be worth converting the files from version 3 to version 4 if you need to repeatedly process these files. 如果需要重复处理这些文件，也可能需要将文件从版本3转换为版本4。

Answer 2

Reading datasets from memory is supported since netCDF4-1.2.8 ( Changelog ): 从netCDF4-1.2.8（ Changelog ）开始，支持从内存中读取数据集：

import netCDF4
import gzip

with gzip.open('test.nc.gz') as gz:
    with netCDF4.Dataset('dummy', mode='r', memory=gz.read()) as nc:
        print(nc.variables)

See the description of the memory parameter in the Dataset documentation 请参阅Dataset文档中的memory参数说明

Answer 3

Since I just had to solve the same problem, here is a ready-made solution: 由于我只需要解决同样的问题，这里有一个现成的解决方案：

import gzip
import os
import shutil
import tempfile

import netCDF4

def open_netcdf(fname):
    if fname.endswith(".gz"):
        infile = gzip.open(fname, 'rb')
        tmp = tempfile.NamedTemporaryFile(delete=False)
        shutil.copyfileobj(infile, tmp)
        infile.close()
        tmp.close()
        data = netCDF4.Dataset(tmp.name)
        os.unlink(tmp.name)
    else:
        data = netCDF4.Dataset(fname)
    return data

如何在python中读取gzip netcdf文件？

问题描述

3 个解决方案

解决方案1
5 已采纳 2014-12-05 22:24:44

解决方案2
4 2018-08-01 08:24:22

解决方案3
2 2017-07-27 16:18:33

如何在python中读取gzip netcdf文件？

问题描述

3 个解决方案

解决方案1 5 已采纳 2014-12-05 22:24:44

解决方案2 4 2018-08-01 08:24:22

解决方案3 2 2017-07-27 16:18:33

解决方案1
5 已采纳 2014-12-05 22:24:44

解决方案2
4 2018-08-01 08:24:22

解决方案3
2 2017-07-27 16:18:33