简体   繁体   中英

Why is xarray introducing random numbers to a variable inside a NetCDF4 file when loading?

Problem:

I have created a NetCDF4 file that when opened using xarray, high values are introduced to the variable of interest and the kernel keeps crashing. I do not see the high values when loading into MATLAB suggesting that it is perhaps an incompatibility issue between the NetCDF4 file and xarray?

This is what I do:

I first create a NetCDF4 file including my variable of interest:

from netCDF4 import Dataset
import numpy as np
import xarray as xr
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Create data
data_2_save = np.squeeze(np.float32(np.zeros([6,29947])))
data_2_save[0,1000:27300] = np.nan; 
data_2_save[1,1010:27310] = np.nan; 
data_2_save[2,1050:27350] = np.nan; 
data_2_save[3,1000:27300] = np.nan; 
data_2_save[4,900:27300] = np.nan; 
data_2_save[5,100:27300] = np.nan; 
# time range
t = np.float32(range(-2921,27026,1))
# for other dimension
d = np.arange(1,7)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# create NetCDF4 file
filename = 'test.nc'
dataset = Dataset(filename, 'w',  format='NETCDF4_CLASSIC') 
fillvalue = 999999
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# define dimensions
TIME_DIM = dataset.createDimension('TIME', None)
D_DIM = dataset.createDimension('D', np.size(d))
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# create variables
TIME = dataset.createVariable('TIME', np.float32, ('TIME',)) 
D = dataset.createVariable('D', np.int32, ('D',))
VAR = dataset.createVariable('VARIABLE', np.float32, ('TIME','D'), 
                              fill_value=fillvalue)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# VAR
VAR.long_name = 'name'
VAR.valid_max = np.float32(np.nanmax(data_2_save))
VAR.valid_min = np.float32(np.nanmin(data_2_save))
VAR.coordinates = 'TIME D'
VAR.comment = ('A comment goes here')
# Time
time_unit_out= "days since 1950-01-01 00:00:00 UTC"
TIME.units = time_unit_out
TIME.long_name = 'analysis time'
TIME.standard_name = 'time'
TIME.valid_max = np.nanmax(t)
TIME.valid_min = np.nanmin(t)
TIME.axis = 'T'
TIME.calendar = 'gregorian'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# D
D.standard_name = 'D'
D.valid_max = np.int32(np.round(np.nanmax(d)))
D.valid_min = np.int32(np.round(np.nanmin(d)))
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Save data into NetCDF
TIME[:] = t
D[:] = np.ndarray.tolist(d)
VAR[:] =np.transpose(data_2_save)
dataset.close()# and the file is written

I then load the file later and plot as follows:

import xarray as xr
data = xr.open_dataset('test.nc')
data.VARIABLE[:,1].plot()

Then either the kernel crashes or a plot is produced. A different plot is produced everytime, with random numbers plotted alongside what I would expect (0., nan). These random numbers can be = ~20000, > e+38, and sometimes = 0. These random numbers tend to be at the end of the variable array where there are supposed to be NaNs. Sometimes there are no random numbers introduced.

I have tried the following:

  • 'conda update --all'
  • experiment with using 'np.int32', 'np.float64', 'float' when creating the variable in the NetCDF4 file
  • changing the format from 'NETCDF-CLASSIC' to 'NETCDF4' when creating the NetCDF4 file

Versions

  • Python 3.9
  • xarray 0.20.1
  • matplotlib 3.5.1
  • netcdf4 1.5.7
  • numpy 1.21.5

I have recently reinstalled Anaconda and packages because of an issue using pip and conda to install packages.

I have tried this using Spyder and Jupyter Notebook, and it happens when using both.

I decided to make the NetCDF file using xarray instead of the netCDF4 package. The problem does not occur anymore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM