MemoryError during the NetCDF4 saving via xarray

Question

I would like to load my 499 NetCDF files by xarray and concatenate them however, it seems that I am stuck up at saving the file.

Here's my code snippet:

import pandas as pd
import numpy as np
import xarray as xr
import os

files_xr = [f for f in os.listdir(os.getcwd()) if f.startswith("Precipitation") and f.endswith(".nc")]
files_xr_mer = xr.open_mfdataset(files_xr, combine='by_coords')
files_xr_mer['units'] = 'mm'
new_filename_1 = './prec_file_testing.nc'
files_xr_mer.to_netcdf(path=new_filename_1)

Traceback (most recent call last)
MemoryError: Unable to allocate 2.80 GiB for an array with shape (29, 3601, 7199) and data type float32

Thanks for any suggestion! I would like to definitely use python and NCO or CDO as the last option!

Answer 1

You could try passing a value for the chunks keyword in open_mfdataset . This should enable data streaming, where not everything is loaded into memory at once. https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html

Eg chunks={"time": 1} if time is one of your dimensions will result in chunks being loaded one-by-one. There might be some interaction with the concatenation, you might have to take into account how the concatenation is happening to make it (more) efficient.

See also this documentation: https://xarray.pydata.org/en/stable/dask.html#chunking-and-performance

MemoryError during the NetCDF4 saving via xarray

Question

1 answers

solution1
1 2021-02-16 21:49:48

MemoryError during the NetCDF4 saving via xarray

Question

1 answers

solution1 1 2021-02-16 21:49:48

solution1
1 2021-02-16 21:49:48