[英]python, netcdf4: need intorduction in creating a unlimited time dimension for netcdf
Can somebody give an introduction in how to create an unlimited time dimension for a NetCDF file? 有人可以介绍如何为NetCDF文件创建无限时间维度吗? I tried to use
data.createDimension('t', None)
, but when I look at t
it is a Numpy array. 我尝试使用
data.createDimension('t', None)
,但是当我看t
它是一个Numpy数组。 If possible, please give an introduction in assigning values to it too. 如果可能的话,也请介绍一下如何为其分配值。 I am using python 2.7.
我正在使用python 2.7。
I have multiple NetCDF-files (3 dimensions) and for each I have to calculate an array (3 dimensions). 我有多个NetCDF文件(3个维度),每个文件我都必须计算一个数组(3个维度)。 The time step between the files is 3 hours.
文件之间的时间步长为3个小时。 Now I have to create a new NetCDF with the calculated array for each time step.
现在,我必须使用每个时间步骤的计算数组来创建一个新的NetCDF。 My Problem is, that I do not know how to access the time axis, so that I can assign the calculated array to the different time step it.
我的问题是,我不知道如何访问时间轴,因此我可以将计算所得的数组分配给它的不同时间步长。
I want to assign a date to the time axis. 我想为时间轴分配一个日期。 For creating the date I have used
datetime
like this: 为了创建日期,我使用了像这样的
datetime
:
t_start = dt.datetime(1900,1,1)
t_delta = dt.timedelta(hours=3)
The time between two timesteps is 3 hours. 两个时间步之间的时间为3个小时。 While looping over the files the date for the time step is calculated like this:
循环遍历文件时,时间步的日期计算如下:
t_mom = t_start + i*t_delta
t_mom_str = t_mom.strftime("%d %B %Y %H %M %S")
t_mom_var = netCDF4.stringtochar(np.array([t_mom_str]))
I have created a Variable like this: 我创建了一个这样的变量:
time = data.createVariable('time', np.float32, ('time'))
Now I want to assign the date to the time variable: 现在,我想将日期分配给时间变量:
time[i] = t_mom_var[:]
But it is not working this way. 但这不是这种方式。 Thanks for helping.
感谢您的帮助。
Using createDimension
with None
should work: 将
createDimension
与None
一起使用应该可以:
import netCDF4 as nc4
import numpy as np
f = nc4.Dataset('test.nc', 'w')
# Create the unlimited time dimension:
dim_t = f.createDimension('time', None)
# Create a variable `time` using the unlimited dimension:
var_t = f.createVariable('time', 'int', ('time'))
# Add some values to the variable:
var_t[:] = np.arange(10)
f.close()
This results in ( ncdump -h test.nc
): 结果为(
ncdump -h test.nc
):
netcdf test {
dimensions:
time = UNLIMITED ; // (10 currently)
variables:
int64 time(time) ;
}
For the updated question, a minimal working example of how to merge multiple files into one by adding a new unlimited dimension: 对于更新的问题,一个最小的工作示例,说明如何通过添加新的无限制维度将多个文件合并为一个文件:
import netCDF4 as nc4
import numpy as np
# Lets quickly create 3 NetCDF files with 3 dimensions
for i in range(3):
f = nc4.Dataset('test_{0:1d}.nc'.format(i), 'w')
# Create the 3 dimensions
dim_x = f.createDimension('x', 2)
dim_y = f.createDimension('y', 3)
dim_z = f.createDimension('z', 4)
var_t = f.createVariable('temperature', 'double', ('x','y','z'))
# Add some dummy data
var_t[:,:,:] = np.random.random(2*3*4).reshape(2,3,4)
f.close()
# Now the actual merging:
# Get the dimensions (sizes) from the first file:
f_in = nc4.Dataset('test_0.nc', 'r')
dim_size_x = f_in.dimensions['x'].size
dim_size_y = f_in.dimensions['y'].size
dim_size_z = f_in.dimensions['z'].size
dim_size_t = 3
f_in.close()
# Create new NetCDF file:
f_out = nc4.Dataset('test_merged.nc', 'w')
# Add the dimensions, including an unlimited time dimension:
dim_x = f_out.createDimension('x', dim_size_x)
dim_y = f_out.createDimension('y', dim_size_y)
dim_z = f_out.createDimension('z', dim_size_z)
dim_t = f_out.createDimension('time', None)
# Create new variable with 4 dimensions
var_t = f_out.createVariable('temperature', 'double', ('time','x','y','z'))
# Add the data
for i in range(3):
f_in = nc4.Dataset('test_{0:1d}.nc'.format(i), 'r')
var_t[i,:,:,:] = f_in.variables['temperature'][:,:,:]
f_in.close()
f_out.close()
@Bart is correct but didn't answer the second part of your question. @Bart是正确的,但没有回答问题的第二部分。 You need to create a time variable dimensioned by your time dimension.
您需要创建一个以时间维度为维度的时间变量。
import numpy as np
import dateutil.parser
# create a time variable, using the time dimension.
var_t = nc4.createVariable('time', 'int32', ('time'))
var_t.setncattr('units', 'seconds since 1970-01-01 00:00:00 UTC')
# create a start time
dt = dateutil.parser.parse("2017-05-01T00:00)
ntime = nc4.date2num(dt, var_t.units)
# add some hours
times = [ntime, ntime + 3600, ntime + 7200]
# Not sure but you may need a numpy array
times = np.array([times])
var_t[:] = times
You can read in the NetCDF files via xarray
's xr.open_dataset()
: 您可以通过
xarray
的xr.open_dataset()
读取NetCDF文件:
# Get all the files as a list and open them as Datasets
import glob
folder = '<folder directory with files>'
ncfiles = glob.glob(folder+'*.nc')
ds_l = [ xr.open_dataset(i) for i in ncfiles]
# To make this a stand alone example, i'll just create a list of Datasets too
ds = xr.Dataset( data_vars={'data': ( [ 'lon', 'lat',], arr)},
coords={'lat': np.arange(30), 'lon': np.arange(50)}, )
ds_l = [ds]*5
Now you can add the dates as an new coordinate: 现在,您可以将日期添加为新坐标:
(here I make the date list with pandas
' pd.data_range()
method) (这里我用
pandas
的pd.data_range()
方法制作日期列表)
# List of dates
start = datetime.datetime(1900,1,1)
end = datetime.datetime(1900,1,5)
import pandas as pd
dates = pd.date_range( start, end, freq='3H')
# Now add these dates to the datasets
for n, ds in enumerate( ds_l ):
ds.coords['time'] = dates[n]
Then you can concatenate along the time axis via the xr.concat()
method and save as a netCDF via the xr.to_netdf()
method (Note setting of time dimension to unlimited) 然后就可以通过沿时间轴串联
xr.concat()
方法和作为的netCDF经由保存xr.to_netdf()
方法(时间维度为无限制的注设定)
# Then concatenate them:
ds = xr.concat( ds_l, dim='time' )
ds.to_netcdf('mynewfile.nc', unlimited_dims={'time':True})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.