简体   繁体   English

python,netcdf4:在为netcdf创建无限时间维度时需要介绍

[英]python, netcdf4: need intorduction in creating a unlimited time dimension for netcdf

Can somebody give an introduction in how to create an unlimited time dimension for a NetCDF file? 有人可以介绍如何为NetCDF文件创建无限时间维度吗? I tried to use data.createDimension('t', None) , but when I look at t it is a Numpy array. 我尝试使用data.createDimension('t', None) ,但是当我看t它是一个Numpy数组。 If possible, please give an introduction in assigning values to it too. 如果可能的话,也请介绍一下如何为其分配值。 I am using python 2.7. 我正在使用python 2.7。

edited question 编辑的问题

I have multiple NetCDF-files (3 dimensions) and for each I have to calculate an array (3 dimensions). 我有多个NetCDF文件(3个维度),每个文件我都必须计算一个数组(3个维度)。 The time step between the files is 3 hours. 文件之间的时间步长为3个小时。 Now I have to create a new NetCDF with the calculated array for each time step. 现在,我必须使用每个时间步骤的计算数组来创建一个新的NetCDF。 My Problem is, that I do not know how to access the time axis, so that I can assign the calculated array to the different time step it. 我的问题是,我不知道如何访问时间轴,因此我可以将计算所得的数组分配给它的不同时间步长。

edited question 编辑的问题

I want to assign a date to the time axis. 我想为时间轴分配一个日期。 For creating the date I have used datetime like this: 为了创建日期,我使用了像这样的datetime

t_start = dt.datetime(1900,1,1)
t_delta = dt.timedelta(hours=3)

The time between two timesteps is 3 hours. 两个时间步之间的时间为3个小时。 While looping over the files the date for the time step is calculated like this: 循环遍历文件时,时间步的日期计算如下:

t_mom = t_start + i*t_delta
t_mom_str = t_mom.strftime("%d %B %Y %H  %M  %S")
t_mom_var = netCDF4.stringtochar(np.array([t_mom_str]))

I have created a Variable like this: 我创建了一个这样的变量:

time = data.createVariable('time', np.float32, ('time'))

Now I want to assign the date to the time variable: 现在,我想将日期分配给时间变量:

time[i] = t_mom_var[:]

But it is not working this way. 但这不是这种方式。 Thanks for helping. 感谢您的帮助。

Using createDimension with None should work: createDimensionNone一起使用应该可以:

import netCDF4 as nc4
import numpy as np

f = nc4.Dataset('test.nc', 'w')

# Create the unlimited time dimension:
dim_t = f.createDimension('time', None)
# Create a variable `time` using the unlimited dimension:
var_t = f.createVariable('time', 'int', ('time'))
# Add some values to the variable:
var_t[:] = np.arange(10)
f.close()

This results in ( ncdump -h test.nc ): 结果为( ncdump -h test.nc ):

netcdf test {
dimensions:
    time = UNLIMITED ; // (10 currently)
variables:
    int64 time(time) ;
}

For the updated question, a minimal working example of how to merge multiple files into one by adding a new unlimited dimension: 对于更新的问题,一个最小的工作示例,说明如何通过添加新的无限制维度将多个文件合并为一个文件:

import netCDF4 as nc4
import numpy as np

# Lets quickly create 3 NetCDF files with 3 dimensions
for i in range(3):
    f = nc4.Dataset('test_{0:1d}.nc'.format(i), 'w')

    # Create the 3 dimensions
    dim_x = f.createDimension('x', 2)
    dim_y = f.createDimension('y', 3)
    dim_z = f.createDimension('z', 4)
    var_t = f.createVariable('temperature', 'double', ('x','y','z'))

    # Add some dummy data
    var_t[:,:,:] = np.random.random(2*3*4).reshape(2,3,4)

    f.close()

# Now the actual merging:
# Get the dimensions (sizes) from the first file:
f_in = nc4.Dataset('test_0.nc', 'r')
dim_size_x = f_in.dimensions['x'].size
dim_size_y = f_in.dimensions['y'].size
dim_size_z = f_in.dimensions['z'].size
dim_size_t = 3
f_in.close()

# Create new NetCDF file:
f_out = nc4.Dataset('test_merged.nc', 'w')

# Add the dimensions, including an unlimited time dimension:
dim_x = f_out.createDimension('x', dim_size_x)
dim_y = f_out.createDimension('y', dim_size_y)
dim_z = f_out.createDimension('z', dim_size_z)
dim_t = f_out.createDimension('time', None)

# Create new variable with 4 dimensions
var_t = f_out.createVariable('temperature', 'double', ('time','x','y','z'))

# Add the data
for i in range(3):
    f_in = nc4.Dataset('test_{0:1d}.nc'.format(i), 'r')
    var_t[i,:,:,:] = f_in.variables['temperature'][:,:,:]
    f_in.close()

f_out.close()

@Bart is correct but didn't answer the second part of your question. @Bart是正确的,但没有回答问题的第二部分。 You need to create a time variable dimensioned by your time dimension. 您需要创建一个以时间维度为维度的时间变量。

  import numpy as np
  import dateutil.parser

  # create a time variable, using the time dimension.
  var_t = nc4.createVariable('time', 'int32', ('time'))
  var_t.setncattr('units', 'seconds since 1970-01-01 00:00:00 UTC')
  # create a start time
  dt = dateutil.parser.parse("2017-05-01T00:00)
  ntime = nc4.date2num(dt, var_t.units)
  # add some hours
  times = [ntime, ntime + 3600, ntime + 7200]
  # Not sure but you may need a numpy array
  times = np.array([times])
  var_t[:] = times

You can read in the NetCDF files via xarray 's xr.open_dataset() : 您可以通过xarrayxr.open_dataset()读取NetCDF文件:

# Get all the files as a list and open them as Datasets
import glob
folder = '<folder directory with files>'
ncfiles = glob.glob(folder+'*.nc')
ds_l = [ xr.open_dataset(i) for i in ncfiles]

# To make this a stand alone example, i'll just create a list of Datasets too
ds = xr.Dataset( data_vars={'data': ( [ 'lon', 'lat',], arr)}, 
    coords={'lat': np.arange(30), 'lon': np.arange(50)}, ) 
ds_l = [ds]*5

Now you can add the dates as an new coordinate: 现在,您可以将日期添加为新坐标:
(here I make the date list with pandas ' pd.data_range() method) (这里我用pandaspd.data_range()方法制作日期列表)

# List of dates
start = datetime.datetime(1900,1,1)
end = datetime.datetime(1900,1,5)
import pandas as pd
dates = pd.date_range( start, end, freq='3H')
# Now add these dates to the datasets
for n, ds in enumerate( ds_l ):
   ds.coords['time'] = dates[n]

Then you can concatenate along the time axis via the xr.concat() method and save as a netCDF via the xr.to_netdf() method (Note setting of time dimension to unlimited) 然后就可以通过沿时间轴串联xr.concat()方法和作为的netCDF经由保存xr.to_netdf()方法(时间维度为无限制的注设定)

# Then concatenate them:
ds = xr.concat( ds_l, dim='time' )
ds.to_netcdf('mynewfile.nc', unlimited_dims={'time':True})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM