按维度名称提取netcdf4变量切片

Question

I have a netCDF file with 4 dimensions. 我有一个4维的netCDF文件。 I want to extract a slice from the netCDF file by giving the name of one of the dimensions 我想通过给出尺寸之一的名称从netCDF文件中提取一个切片

I know how to do this by position. 我知道如何按位置执行此操作。 Eg 例如

from netCDF4 import Dataset
hndl_nc = Dataset(path_to_nc)

# Access by slice
hndl_nc.variables['name_variable'][:,5,:,:]

Given that I know the names of the dimensions, say A , B , C , D . 鉴于我知道尺寸的名称，例如A ， B ， C ， D 。 How do I access by dimension name instead of position? 如何通过尺寸名称而不是位置进行访问？

Answer 1

You can use xarray 's indexing capabilities to access netcdf data by dimension name. 您可以使用xarray的索引功能通过维度名称访问netcdf数据。

import xarray as xr
ds = xr.open_dataset('./foo.nc')
var = ds['name_variable']
# Slice var by Dimension "A" between values 0 and 5
var_slice = var.sel(A=slice(0,5))

Answer 2

It seems the closest current solution is 看来目前最接近的解决方案是

np.take(nc4_variable[:],dim_ids,axis=dim)

or 要么

nc4_variable[:].take(dim_ids,axis=dim)

where dim_ids is a list or tuple of your slices, and dim is the dimension along which you want to slice. 其中dim_ids是切片的列表或元组，而dim是要切片的维度。 Unfortunately, this seems to load the entire dataset first, and there doesn't seem to be a way around that; 不幸的是，这似乎首先要加载整个数据集，而且似乎没有办法解决。 the [:] is necessary. [:]是必需的。 Neglecting it in the first method loads data without adjustments from the add_offset , _FillValue , etc. parameters; 在第一种方法中忽略它会加载数据，而无需通过add_offset ， _FillValue等参数进行调整； neglecting it in the second method yields an error. 在第二种方法中忽略它会产生一个错误。

Testing with %timeit in Ipython confirms major differences between normal slicing and the np.take method. 在Ipython中使用%timeit进行测试，可以确认正常切片与np.take方法之间的主要差异。

Hope someone comes up with a more complete answer to this; 希望有人能对此提供更完整的答案； would be very useful for diverse datasets. 对于不同的数据集将非常有用。

Answer 3

So, I might have come up with something that could qualify as a " solution ". 因此，我可能想出了一些可以称为“ 解决方案 ”的东西。

numpy arrays can evidently be indexed with a singleton list of iterables, eg numpy数组显然可以使用可迭代的单例列表进行索引，例如

a = np.reshape(range(0,16),(4,4),order='F')
a = a[ [[0,1], [1]] ]

returns a equal to array([4,5]) . 返回a等于array([4,5]) Another example would be [[range(3),[1 2],3]] . 另一个示例是[[range(3),[1 2],3]] 。 These singleton lists are unfurled in the manner of *subscripts , as if you had directly queried a[[0,1],1] instead of a[ [[0,1],1] ] . 这些单例列表以*subscripts方式展开，就好像您直接查询a[[0,1],1]而不是a[ [[0,1],1] ] 。

So, if you are able to query the position and length of each dimension in your netCDF variable (pretty easy with nc_fid[var].dimension and nc_fid[var].shape ), then you can simply permute a list according to the location of each dimension. 因此，如果您能够查询netCDF变量中每个维度的位置和长度（使用nc_fid[var].dimension nc_fid[var].shape和nc_fid[var].shape非常容易），则可以根据列表的位置简单地排列列表。每个维度。 For example, if you have data of shape time by lon by lat, and you want all longitudes, all latitudes, and time index t=5 , you can use something like 例如，如果您拥有形状时间为lon等于lat的数据，并且想要所有经度，所有纬度和时间索引t=5 ，则可以使用类似

order_want = ['lon', 'lat', 'time'] # must figure out dimension names a priori
nlon = nc_fid[var].shape[nc_fid[var].dimensions.index('lon')]
nlat = nc_fid[var].shape[nc_fid[var].dimensions.index('lat')]
ids = [ range(0,nlon), range(0,nlat), 5 ]
ids_permute = [order_want.index(n) for n in nc_fid[var].dimensions] 
ids_query = [l[i] for l,i in zip(ids,ids_permute)]

sliced_data = nc_fid[var][list_query]

This requires no a priori knowledge of the dimension position, and does not require loading all dimensions of the variable. 这不需要先验的尺寸位置知识，也不需要加载变量的所有尺寸。

Note that after some %timeit testing in IPython, it appears there is some special delay for all-integer indexing, eg list_query = [0,0,0,0] will take 80ms whereas list_query = [range(1),0,0,0] or even list_query = [[0,1,2,3,4,5],0,0,0] will take 1ms . 请注意，在IPython中进行%timeit测试之后，似乎对全整数索引存在一些特殊的延迟，例如list_query = [0,0,0,0]将花费80ms，而list_query = [range(1),0,0,0]甚至list_query = [[0,1,2,3,4,5],0,0,0]将花费1 毫秒。 Very mysterious; 非常神秘； anyway, evidently you should try to make sure list_query is not just a list of integers. 无论如何，显然您应该尝试确保list_query不只是整数列表。

按维度名称提取netcdf4变量切片

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-09-17 14:47:25

解决方案2
1 2017-02-02 08:14:18

解决方案3
0 2017-02-03 07:12:07

按维度名称提取netcdf4变量切片

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-09-17 14:47:25

解决方案2 1 2017-02-02 08:14:18

解决方案3 0 2017-02-03 07:12:07

解决方案1
1 已采纳 2016-09-17 14:47:25

解决方案2
1 2017-02-02 08:14:18

解决方案3
0 2017-02-03 07:12:07