简体   繁体   中英

How to incrementally save multiple variables using Python netCDF4?

I'm trying to write multiple variables (say A and B ) to a single netcdf file using the python netCDF4 module.

My function outputs a new time slice for A and a new time slice for B in each loop iteration, and I'm trying to save these new slices to file as they come out, rather than accumulating in RAM and saving in one go.

Below is my current attempt:

import numpy as np
from netCDF4 import Dataset, date2num, num2date

fout=Dataset('test.nc', 'w')

x=np.arange(10)
y=np.arange(20)
xx,yy=np.meshgrid(x,y)

# create dimensions
fout.createDimension('x', len(x))
fout.createDimension('y', len(y))
fout.createDimension('t', None)

x_ax=fout.createVariable('x', np.float, ['x',])
x_ax[:]=x
y_ax=fout.createVariable('y', np.float, ['y',])
y_ax[:]=y
t_ax=fout.createVariable('t', np.float, ['t',])

# a big loop
for ii in range(10):  

    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]

    if 'var_A' not in fout.variables.keys():
        # if 1st time writing "var_A", create new variable
        var_A=fout.createVariable('var_A', np.float, ['t', 'y', 'x'])
        var_A[:]=var_aii
    else:
        # if variable already created, append to the end of 1st dimension
        var_A=fout.variables['var_A']
        var_A[:]=np.concatenate([var_A[:], var_aii])

    if 'var_B' not in fout.variables.keys():
        var_B=fout.createVariable('var_B', np.float, ['t', 'y', 'x'])
        var_B[:]=var_aii
    else:
        var_B=fout.variables['var_B']
        var_B[:]=np.concatenate([var_B[:], var_aii])

    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

fout.close()

Here is the output:

ii= 0 var_A.shape= (1, 20, 10) var_B.shape= (1, 20, 10)
ii= 1 var_A.shape= (3, 20, 10) var_B.shape= (3, 20, 10)
ii= 2 var_A.shape= (5, 20, 10) var_B.shape= (5, 20, 10)
ii= 3 var_A.shape= (7, 20, 10) var_B.shape= (7, 20, 10)
ii= 4 var_A.shape= (9, 20, 10) var_B.shape= (9, 20, 10)
ii= 5 var_A.shape= (11, 20, 10) var_B.shape= (11, 20, 10)
ii= 6 var_A.shape= (13, 20, 10) var_B.shape= (13, 20, 10)
ii= 7 var_A.shape= (15, 20, 10) var_B.shape= (15, 20, 10)
ii= 8 var_A.shape= (17, 20, 10) var_B.shape= (17, 20, 10)
ii= 9 var_A.shape= (19, 20, 10) var_B.shape= (19, 20, 10)

The problem is that the time t dimension grows by a step of 2, rather than 1. I think that's because the unlimited t dimension automatically expands as more data are appended, so in the ii==1 iteration, after writing var_A , the time dimension grows to a length of 2 , so when appending var_B , var_B already has a length of 2 before appending.

I'm not using ii as the index to assign values like var_A[ii]=var_aii , because I feel that it's error-prone. If there is some conditional continue s in the loop that skips a few ii s, gaps will be created.

So what's a more robust way of appending incrementally multiple variables along the time dimension?

It seems that querying the length of the current time dimension to get the insertion index is not quite enough.

I made a rough solution to append data to existing variables in netcdf files:

def appendTime(fout, newslice, newt, varid):
    '''Append data along time dimension

    Args:
        fout (netCDF4.Dataset): opened Dataset file obj to write into.
        newslice (ndarray): new time slice data to save.
        newt (1darray): new time values of <newslice>.
        varid (str): variable id.
    '''

    newt=np.atleast_1d(newt)

    if varid not in fout.variables.keys():
        #-----------------Create variable-----------------
        varout=fout.createVariable(varid, np.float, ('t','y','x'), zlib=True)
        varout[:]=newslice
    else:
        #-----------------Append variable-----------------
        varout=fout.variables[varid]
        timeax=fout.variables['t']
        tlen=len(timeax)
        t0=newt[0]
        tidx=np.where(timeax==t0)[0]
        if len(tidx)>0:
            # time point already exists
            tidx=tidx[0]
        else:
            # new time point added
            tidx=tlen
        timeax[tidx:tidx+len(newt)]=newt
        varout[tidx:tidx+len(newt)]=newslice

    return

Now the code inside loop could be:

for ii in range(10):
    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]
    appendTime(fout, var_aii, ii, 'var_A')
    appendTime(fout, var_bii, ii, 'var_B')
    var_A=fout.variables['var_A']
    var_B=fout.variables['var_B']
    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM