Creating a big netcdf file (>10Gb) with netCDF4 in python

Question

I am having problems trying to create a very big netCDF file in python in a machine with 8gb of RAM.

I created a very big array with numpy.memmap in order to have this array in disk and not in ram because its size exceeds the available ram and swap space. (ram and swap = 8 gb each)

I created a variable in the nc file with

var = ncout.createVariable('data',ARRAY.dtype,\
                       ('time','latitude','longitude',),\
                        chunksizes=(5000,61,720))

var[:]=ARRAY[:]

When the code reach this point It loads into the ram the ARRAY that is saved in disk and then I have memory error.

How can I save such a big files?

Thanks.

Answer 1

The best way to read and write large NetCDF4 files is with Xarray , which reads and writes data in chunks automatically using Dask below the hood.

import xarray as xr
ds = xr.open_dataset('my_big_input_file.nc', 
            chunks={'time':5000, ,'latitude':61, ,'longitude':720})
ds.to_netcdf('my_big_output_file.nc',mode='w')

You can speed things up by using parallel computing with Dask .

Answer 2

Iterating directly over an array gives you the slices along the first dimension. Using enumerate will give you both the slice and the index:

for ind, slice in enumerate(ARRAY):
    var[ind] = slice

I'm not positive whether netCDF4-python will keep the slices around in memory, though.

Creating a big netcdf file (>10Gb) with netCDF4 in python

Question

2 answers

solution1
1 2020-10-14 20:44:05

solution2
0 2015-09-17 19:04:29

Creating a big netcdf file (>10Gb) with netCDF4 in python

Question

2 answers

solution1 1 2020-10-14 20:44:05

solution2 0 2015-09-17 19:04:29

solution1
1 2020-10-14 20:44:05

solution2
0 2015-09-17 19:04:29