简体   繁体   中英

Creating a big netcdf file (>10Gb) with netCDF4 in python

I am having problems trying to create a very big netCDF file in python in a machine with 8gb of RAM.

I created a very big array with numpy.memmap in order to have this array in disk and not in ram because its size exceeds the available ram and swap space. (ram and swap = 8 gb each)

I created a variable in the nc file with

var = ncout.createVariable('data',ARRAY.dtype,\
                       ('time','latitude','longitude',),\
                        chunksizes=(5000,61,720))

var[:]=ARRAY[:]

When the code reach this point It loads into the ram the ARRAY that is saved in disk and then I have memory error.

How can I save such a big files?

Thanks.

The best way to read and write large NetCDF4 files is with Xarray , which reads and writes data in chunks automatically using Dask below the hood.

import xarray as xr
ds = xr.open_dataset('my_big_input_file.nc', 
            chunks={'time':5000, ,'latitude':61, ,'longitude':720})
ds.to_netcdf('my_big_output_file.nc',mode='w')

You can speed things up by using parallel computing with Dask .

Iterating directly over an array gives you the slices along the first dimension. Using enumerate will give you both the slice and the index:

for ind, slice in enumerate(ARRAY):
    var[ind] = slice

I'm not positive whether netCDF4-python will keep the slices around in memory, though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM