[英]Extract netcdf4 variable slice by dimension name
I have a netCDF file with 4 dimensions. 我有一个4维的netCDF文件。 I want to extract a slice from the netCDF file by giving the name of one of the dimensions 我想通过给出尺寸之一的名称从netCDF文件中提取一个切片
I know how to do this by position. 我知道如何按位置执行此操作。 Eg 例如
from netCDF4 import Dataset
hndl_nc = Dataset(path_to_nc)
# Access by slice
hndl_nc.variables['name_variable'][:,5,:,:]
Given that I know the names of the dimensions, say A
, B
, C
, D
. 鉴于我知道尺寸的名称,例如A
, B
, C
, D
。 How do I access by dimension name instead of position? 如何通过尺寸名称而不是位置进行访问?
It seems the closest current solution is 看来目前最接近的解决方案是
np.take(nc4_variable[:],dim_ids,axis=dim)
or 要么
nc4_variable[:].take(dim_ids,axis=dim)
where dim_ids
is a list or tuple of your slices, and dim
is the dimension along which you want to slice. 其中dim_ids
是切片的列表或元组,而dim
是要切片的维度。 Unfortunately, this seems to load the entire dataset first, and there doesn't seem to be a way around that; 不幸的是,这似乎首先要加载整个数据集,而且似乎没有办法解决。 the [:]
is necessary. [:]
是必需的。 Neglecting it in the first method loads data without adjustments from the add_offset
, _FillValue
, etc. parameters; 在第一种方法中忽略它会加载数据,而无需通过add_offset
, _FillValue
等参数进行调整; neglecting it in the second method yields an error. 在第二种方法中忽略它会产生一个错误。
Testing with %timeit
in Ipython confirms major differences between normal slicing and the np.take
method. 在Ipython中使用%timeit
进行测试,可以确认正常切片与np.take
方法之间的主要差异。
Hope someone comes up with a more complete answer to this; 希望有人能对此提供更完整的答案; would be very useful for diverse datasets. 对于不同的数据集将非常有用。
So, I might have come up with something that could qualify as a " solution ". 因此,我可能想出了一些可以称为“ 解决方案 ”的东西。
numpy arrays can evidently be indexed with a singleton list of iterables, eg numpy数组显然可以使用可迭代的单例列表进行索引,例如
a = np.reshape(range(0,16),(4,4),order='F')
a = a[ [[0,1], [1]] ]
returns a
equal to array([4,5])
. 返回a
等于array([4,5])
Another example would be [[range(3),[1 2],3]]
. 另一个示例是[[range(3),[1 2],3]]
。 These singleton lists are unfurled in the manner of *subscripts
, as if you had directly queried a[[0,1],1]
instead of a[ [[0,1],1] ]
. 这些单例列表以*subscripts
方式展开,就好像您直接查询a[[0,1],1]
而不是a[ [[0,1],1] ]
。
So, if you are able to query the position and length of each dimension in your netCDF variable (pretty easy with nc_fid[var].dimension
and nc_fid[var].shape
), then you can simply permute a list according to the location of each dimension. 因此,如果您能够查询netCDF变量中每个维度的位置和长度(使用nc_fid[var].dimension
nc_fid[var].shape
和nc_fid[var].shape
非常容易),则可以根据列表的位置简单地排列列表。每个维度。 For example, if you have data of shape time by lon by lat, and you want all longitudes, all latitudes, and time index t=5
, you can use something like 例如,如果您拥有形状时间为lon等于lat的数据,并且想要所有经度, 所有纬度和时间索引t=5
,则可以使用类似
order_want = ['lon', 'lat', 'time'] # must figure out dimension names a priori
nlon = nc_fid[var].shape[nc_fid[var].dimensions.index('lon')]
nlat = nc_fid[var].shape[nc_fid[var].dimensions.index('lat')]
ids = [ range(0,nlon), range(0,nlat), 5 ]
ids_permute = [order_want.index(n) for n in nc_fid[var].dimensions]
ids_query = [l[i] for l,i in zip(ids,ids_permute)]
sliced_data = nc_fid[var][list_query]
This requires no a priori knowledge of the dimension position, and does not require loading all dimensions of the variable. 这不需要先验的尺寸位置知识, 也不需要加载变量的所有尺寸。
Note that after some %timeit
testing in IPython, it appears there is some special delay for all-integer indexing, eg list_query = [0,0,0,0]
will take 80ms whereas list_query = [range(1),0,0,0]
or even list_query = [[0,1,2,3,4,5],0,0,0]
will take 1ms . 请注意,在IPython中进行%timeit
测试之后,似乎对全整数索引存在一些特殊的延迟,例如list_query = [0,0,0,0]
将花费80ms,而list_query = [range(1),0,0,0]
甚至list_query = [[0,1,2,3,4,5],0,0,0]
将花费1 毫秒 。 Very mysterious; 非常神秘; anyway, evidently you should try to make sure list_query
is not just a list of integers. 无论如何,显然您应该尝试确保list_query
不只是整数列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.