I'm trying to write multiple variables (say A
and B
) to a single netcdf file using the python netCDF4 module.
My function outputs a new time slice for A
and a new time slice for B
in each loop iteration, and I'm trying to save these new slices to file as they come out, rather than accumulating in RAM and saving in one go.
Below is my current attempt:
import numpy as np
from netCDF4 import Dataset, date2num, num2date
fout=Dataset('test.nc', 'w')
x=np.arange(10)
y=np.arange(20)
xx,yy=np.meshgrid(x,y)
# create dimensions
fout.createDimension('x', len(x))
fout.createDimension('y', len(y))
fout.createDimension('t', None)
x_ax=fout.createVariable('x', np.float, ['x',])
x_ax[:]=x
y_ax=fout.createVariable('y', np.float, ['y',])
y_ax[:]=y
t_ax=fout.createVariable('t', np.float, ['t',])
# a big loop
for ii in range(10):
# some function that outputs a slice of A and a slice of B
var_aii=(xx*ii + yy)[None, ...]
var_bii=(xx + yy*ii)[None, ...]
if 'var_A' not in fout.variables.keys():
# if 1st time writing "var_A", create new variable
var_A=fout.createVariable('var_A', np.float, ['t', 'y', 'x'])
var_A[:]=var_aii
else:
# if variable already created, append to the end of 1st dimension
var_A=fout.variables['var_A']
var_A[:]=np.concatenate([var_A[:], var_aii])
if 'var_B' not in fout.variables.keys():
var_B=fout.createVariable('var_B', np.float, ['t', 'y', 'x'])
var_B[:]=var_aii
else:
var_B=fout.variables['var_B']
var_B[:]=np.concatenate([var_B[:], var_aii])
print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)
fout.close()
Here is the output:
ii= 0 var_A.shape= (1, 20, 10) var_B.shape= (1, 20, 10)
ii= 1 var_A.shape= (3, 20, 10) var_B.shape= (3, 20, 10)
ii= 2 var_A.shape= (5, 20, 10) var_B.shape= (5, 20, 10)
ii= 3 var_A.shape= (7, 20, 10) var_B.shape= (7, 20, 10)
ii= 4 var_A.shape= (9, 20, 10) var_B.shape= (9, 20, 10)
ii= 5 var_A.shape= (11, 20, 10) var_B.shape= (11, 20, 10)
ii= 6 var_A.shape= (13, 20, 10) var_B.shape= (13, 20, 10)
ii= 7 var_A.shape= (15, 20, 10) var_B.shape= (15, 20, 10)
ii= 8 var_A.shape= (17, 20, 10) var_B.shape= (17, 20, 10)
ii= 9 var_A.shape= (19, 20, 10) var_B.shape= (19, 20, 10)
The problem is that the time t
dimension grows by a step of 2, rather than 1. I think that's because the unlimited t
dimension automatically expands as more data are appended, so in the ii==1
iteration, after writing var_A
, the time dimension grows to a length of 2
, so when appending var_B
, var_B
already has a length of 2
before appending.
I'm not using ii
as the index to assign values like var_A[ii]=var_aii
, because I feel that it's error-prone. If there is some conditional continue
s in the loop that skips a few ii
s, gaps will be created.
So what's a more robust way of appending incrementally multiple variables along the time dimension?
It seems that querying the length of the current time dimension to get the insertion index is not quite enough.
I made a rough solution to append data to existing variables in netcdf files:
def appendTime(fout, newslice, newt, varid):
'''Append data along time dimension
Args:
fout (netCDF4.Dataset): opened Dataset file obj to write into.
newslice (ndarray): new time slice data to save.
newt (1darray): new time values of <newslice>.
varid (str): variable id.
'''
newt=np.atleast_1d(newt)
if varid not in fout.variables.keys():
#-----------------Create variable-----------------
varout=fout.createVariable(varid, np.float, ('t','y','x'), zlib=True)
varout[:]=newslice
else:
#-----------------Append variable-----------------
varout=fout.variables[varid]
timeax=fout.variables['t']
tlen=len(timeax)
t0=newt[0]
tidx=np.where(timeax==t0)[0]
if len(tidx)>0:
# time point already exists
tidx=tidx[0]
else:
# new time point added
tidx=tlen
timeax[tidx:tidx+len(newt)]=newt
varout[tidx:tidx+len(newt)]=newslice
return
Now the code inside loop could be:
for ii in range(10):
# some function that outputs a slice of A and a slice of B
var_aii=(xx*ii + yy)[None, ...]
var_bii=(xx + yy*ii)[None, ...]
appendTime(fout, var_aii, ii, 'var_A')
appendTime(fout, var_bii, ii, 'var_B')
var_A=fout.variables['var_A']
var_B=fout.variables['var_B']
print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.