简体   繁体   English

按维度名称提取netcdf4变量切片

[英]Extract netcdf4 variable slice by dimension name

I have a netCDF file with 4 dimensions. 我有一个4维的netCDF文件。 I want to extract a slice from the netCDF file by giving the name of one of the dimensions 我想通过给出尺寸之一的名称从netCDF文件中提取一个切片

I know how to do this by position. 我知道如何按位置执行此操作。 Eg 例如

from netCDF4 import Dataset
hndl_nc = Dataset(path_to_nc)

# Access by slice
hndl_nc.variables['name_variable'][:,5,:,:]

Given that I know the names of the dimensions, say A , B , C , D . 鉴于我知道尺寸的名称,例如ABCD How do I access by dimension name instead of position? 如何通过尺寸名称而不是位置进行访问?

You can use xarray 's indexing capabilities to access netcdf data by dimension name. 您可以使用xarray的索引功能通过维度名称访问netcdf数据。

import xarray as xr
ds = xr.open_dataset('./foo.nc')
var = ds['name_variable']
# Slice var by Dimension "A" between values 0 and 5
var_slice = var.sel(A=slice(0,5))

It seems the closest current solution is 看来目前最接近的解决方案是

np.take(nc4_variable[:],dim_ids,axis=dim)

or 要么

nc4_variable[:].take(dim_ids,axis=dim)

where dim_ids is a list or tuple of your slices, and dim is the dimension along which you want to slice. 其中dim_ids是切片的列表或元组,而dim是要切片的维度。 Unfortunately, this seems to load the entire dataset first, and there doesn't seem to be a way around that; 不幸的是,这似乎首先要加载整个数据集,而且似乎没有办法解决。 the [:] is necessary. [:]是必需的。 Neglecting it in the first method loads data without adjustments from the add_offset , _FillValue , etc. parameters; 在第一种方法中忽略它会加载数据,而无需通过add_offset_FillValue等参数进行调整; neglecting it in the second method yields an error. 在第二种方法中忽略它会产生一个错误。

Testing with %timeit in Ipython confirms major differences between normal slicing and the np.take method. 在Ipython中使用%timeit进行测试,可以确认正常切片与np.take方法之间的主要差异。

Hope someone comes up with a more complete answer to this; 希望有人能对此提供更完整的答案; would be very useful for diverse datasets. 对于不同的数据集将非常有用。

So, I might have come up with something that could qualify as a " solution ". 因此,我可能想出了一些可以称为“ 解决方案 ”的东西。

numpy arrays can evidently be indexed with a singleton list of iterables, eg numpy数组显然可以使用可迭代的单例列表进行索引,例如

a = np.reshape(range(0,16),(4,4),order='F')
a = a[ [[0,1], [1]] ]

returns a equal to array([4,5]) . 返回a等于array([4,5]) Another example would be [[range(3),[1 2],3]] . 另一个示例是[[range(3),[1 2],3]] These singleton lists are unfurled in the manner of *subscripts , as if you had directly queried a[[0,1],1] instead of a[ [[0,1],1] ] . 这些单例列表以*subscripts方式展开,就好像您直接查询a[[0,1],1]而不是a[ [[0,1],1] ]

So, if you are able to query the position and length of each dimension in your netCDF variable (pretty easy with nc_fid[var].dimension and nc_fid[var].shape ), then you can simply permute a list according to the location of each dimension. 因此,如果您能够查询netCDF变量中每个维度的位置和长度(使用nc_fid[var].dimension nc_fid[var].shapenc_fid[var].shape非常容易),则可以根据列表的位置简单地排列列表。每个维度。 For example, if you have data of shape time by lon by lat, and you want all longitudes, all latitudes, and time index t=5 , you can use something like 例如,如果您拥有形状时间为lon等于lat的数据,并且想要所有经度, 所有纬度和时间索引t=5 ,则可以使用类似

order_want = ['lon', 'lat', 'time'] # must figure out dimension names a priori
nlon = nc_fid[var].shape[nc_fid[var].dimensions.index('lon')]
nlat = nc_fid[var].shape[nc_fid[var].dimensions.index('lat')]
ids = [ range(0,nlon), range(0,nlat), 5 ]
ids_permute = [order_want.index(n) for n in nc_fid[var].dimensions] 
ids_query = [l[i] for l,i in zip(ids,ids_permute)]

sliced_data = nc_fid[var][list_query]

This requires no a priori knowledge of the dimension position, and does not require loading all dimensions of the variable. 这不需要先验的尺寸位置知识, 也不需要加载变量的所有尺寸。

Note that after some %timeit testing in IPython, it appears there is some special delay for all-integer indexing, eg list_query = [0,0,0,0] will take 80ms whereas list_query = [range(1),0,0,0] or even list_query = [[0,1,2,3,4,5],0,0,0] will take 1ms . 请注意,在IPython中进行%timeit测试之后,似乎对全整数索引存在一些特殊的延迟,例如list_query = [0,0,0,0]将花费80ms,list_query = [range(1),0,0,0]甚至list_query = [[0,1,2,3,4,5],0,0,0]将花费1 毫秒 Very mysterious; 非常神秘; anyway, evidently you should try to make sure list_query is not just a list of integers. 无论如何,显然您应该尝试确保list_query不只是整数列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM