简体   繁体   English

如何使用 Python netCDF4 增量保存多个变量?

[英]How to incrementally save multiple variables using Python netCDF4?

I'm trying to write multiple variables (say A and B ) to a single netcdf file using the python netCDF4 module.我正在尝试使用 python netCDF4 模块将多个变量(例如AB )写入单个 netcdf 文件。

My function outputs a new time slice for A and a new time slice for B in each loop iteration, and I'm trying to save these new slices to file as they come out, rather than accumulating in RAM and saving in one go.我的 function 在每次循环迭代中为A输出一个新时间片,为B输出一个新时间片,我试图在这些新切片出来时将它们保存到文件中,而不是在 RAM 中累积并保存在一个 go 中。

Below is my current attempt:以下是我目前的尝试:

import numpy as np
from netCDF4 import Dataset, date2num, num2date

fout=Dataset('test.nc', 'w')

x=np.arange(10)
y=np.arange(20)
xx,yy=np.meshgrid(x,y)

# create dimensions
fout.createDimension('x', len(x))
fout.createDimension('y', len(y))
fout.createDimension('t', None)

x_ax=fout.createVariable('x', np.float, ['x',])
x_ax[:]=x
y_ax=fout.createVariable('y', np.float, ['y',])
y_ax[:]=y
t_ax=fout.createVariable('t', np.float, ['t',])

# a big loop
for ii in range(10):  

    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]

    if 'var_A' not in fout.variables.keys():
        # if 1st time writing "var_A", create new variable
        var_A=fout.createVariable('var_A', np.float, ['t', 'y', 'x'])
        var_A[:]=var_aii
    else:
        # if variable already created, append to the end of 1st dimension
        var_A=fout.variables['var_A']
        var_A[:]=np.concatenate([var_A[:], var_aii])

    if 'var_B' not in fout.variables.keys():
        var_B=fout.createVariable('var_B', np.float, ['t', 'y', 'x'])
        var_B[:]=var_aii
    else:
        var_B=fout.variables['var_B']
        var_B[:]=np.concatenate([var_B[:], var_aii])

    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

fout.close()

Here is the output:这是 output:

ii= 0 var_A.shape= (1, 20, 10) var_B.shape= (1, 20, 10)
ii= 1 var_A.shape= (3, 20, 10) var_B.shape= (3, 20, 10)
ii= 2 var_A.shape= (5, 20, 10) var_B.shape= (5, 20, 10)
ii= 3 var_A.shape= (7, 20, 10) var_B.shape= (7, 20, 10)
ii= 4 var_A.shape= (9, 20, 10) var_B.shape= (9, 20, 10)
ii= 5 var_A.shape= (11, 20, 10) var_B.shape= (11, 20, 10)
ii= 6 var_A.shape= (13, 20, 10) var_B.shape= (13, 20, 10)
ii= 7 var_A.shape= (15, 20, 10) var_B.shape= (15, 20, 10)
ii= 8 var_A.shape= (17, 20, 10) var_B.shape= (17, 20, 10)
ii= 9 var_A.shape= (19, 20, 10) var_B.shape= (19, 20, 10)

The problem is that the time t dimension grows by a step of 2, rather than 1. I think that's because the unlimited t dimension automatically expands as more data are appended, so in the ii==1 iteration, after writing var_A , the time dimension grows to a length of 2 , so when appending var_B , var_B already has a length of 2 before appending.问题是时间t维度增长了 2 步,而不是 1。我认为这是因为无限t维度会随着更多数据的添加而自动扩展,所以在ii==1迭代中,在编写var_A之后,时间维度增长到2的长度,因此在附加var_B时, var_B在附加之前已经有2的长度。

I'm not using ii as the index to assign values like var_A[ii]=var_aii , because I feel that it's error-prone.我没有使用ii作为索引来分配像var_A[ii]=var_aii这样的值,因为我觉得它容易出错。 If there is some conditional continue s in the loop that skips a few ii s, gaps will be created.如果循环中有一些条件continue跳过了几个ii ,就会产生间隙。

So what's a more robust way of appending incrementally multiple variables along the time dimension?那么在时间维度上增加多个变量的更稳健的方法是什么?

It seems that querying the length of the current time dimension to get the insertion index is not quite enough.看来查询当前时间维度的长度得到插入索引还不够。

I made a rough solution to append data to existing variables in netcdf files:我对netcdf文件中现有变量的append数据做了一个粗略的解决:

def appendTime(fout, newslice, newt, varid):
    '''Append data along time dimension

    Args:
        fout (netCDF4.Dataset): opened Dataset file obj to write into.
        newslice (ndarray): new time slice data to save.
        newt (1darray): new time values of <newslice>.
        varid (str): variable id.
    '''

    newt=np.atleast_1d(newt)

    if varid not in fout.variables.keys():
        #-----------------Create variable-----------------
        varout=fout.createVariable(varid, np.float, ('t','y','x'), zlib=True)
        varout[:]=newslice
    else:
        #-----------------Append variable-----------------
        varout=fout.variables[varid]
        timeax=fout.variables['t']
        tlen=len(timeax)
        t0=newt[0]
        tidx=np.where(timeax==t0)[0]
        if len(tidx)>0:
            # time point already exists
            tidx=tidx[0]
        else:
            # new time point added
            tidx=tlen
        timeax[tidx:tidx+len(newt)]=newt
        varout[tidx:tidx+len(newt)]=newslice

    return

Now the code inside loop could be:现在循环内的代码可能是:

for ii in range(10):
    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]
    appendTime(fout, var_aii, ii, 'var_A')
    appendTime(fout, var_bii, ii, 'var_B')
    var_A=fout.variables['var_A']
    var_B=fout.variables['var_B']
    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM