lat lon 子集的 netcdf4 提取物

[英]netcdf4 extract for subset of lat lon

I would like to extract a spatial subset of a rather large netcdf file.我想提取一个相当大的 netcdf 文件的空间子集。 From Loop through netcdf files and run calculations - Python or R循环通过 netcdf 文件并运行计算 - Python 或 R

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.1989.nc')
# print variables
atemp = f.variables['air'] # TODO: extract spatial subset

How do I extract just the subset of netcdf file corresponding to a state (say Iowa).如何仅提取与州(例如爱荷华州)对应的 netcdf 文件的子集。 Iowa has following boundary lat lon:爱荷华州有以下边界纬度:

Longitude: 89° 5' W to 96° 31' W经度:89° 5' W 到 96° 31' W

Latitude: 40° 36' N to 43° 30' N纬度:40° 36' N 到 43° 30' N

Well this is pretty easy, you have to find the index for the upper and lower bound in latitude and longitude.嗯,这很容易,您必须找到纬度和经度的上限和下限的索引。 You can do it by finding the value that is closest to the ones you're looking for.您可以通过查找与您要查找的值最接近的值来实现。

latbounds = [ 40 , 43 ]
lonbounds = [ -96 , -89 ] # degrees east ? 
lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]

# latitude lower and upper index
latli = np.argmin( np.abs( lats - latbounds[0] ) )
latui = np.argmin( np.abs( lats - latbounds[1] ) ) 

# longitude lower and upper index
lonli = np.argmin( np.abs( lons - lonbounds[0] ) )
lonui = np.argmin( np.abs( lons - lonbounds[1] ) )  

Then just subset the variable array.然后只是对变量数组进行子集化。

# Air (time, latitude, longitude) 
airSubset = f.variables['air'][ : , latli:latui , lonli:lonui ] 
  • Note, i'm assuming the longitude dimension variable is in degrees east, and the air variable has time, latitude, longitude dimensions.请注意,我假设经度维度变量以东经度为单位,而空气变量具有时间、纬度、经度维度。

Favo's answer works (I assume; haven't checked). Favo 的回答有效(我假设;尚未检查)。 A more direct and idiomatic way is to use numpy's where function to find the necessary indices.更直接和惯用的方法是使用 numpy 的where函数来查找必要的索引。

lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]
lat_bnds, lon_bnds = [40, 43], [-96, -89]

lat_inds = np.where((lats > lat_bnds[0]) & (lats < lat_bnds[1]))
lon_inds = np.where((lons > lon_bnds[0]) & (lons < lon_bnds[1]))

air_subset = f.variables['air'][:,lat_inds,lon_inds]

If you like pandas, then you should think about checking out xarray.如果您喜欢熊猫,那么您应该考虑查看 xarray。

import xarray as xr

ds = xr.open_dataset('http://geoport.whoi.edu/thredds/dodsC/usgs/data2/rsignell/models/ncep/narr/air.2m.1980.nc',
lat_bnds, lon_bnds = [40, 43], [-96, -89]
ds.sel(lat=slice(*lat_bnds), lon=slice(*lon_bnds))

请注意,使用NCO 的 ncks在命令行上可以更快地完成此操作

ncks -v air -d latitude,40.,43. -d longitude,-89.,-96. infile.nc -O subset_infile.nc

To mirror the response from N1B4, you can also do it on one line with climate data operators (cdo):要反映 N1B4 的响应,您还可以与气候数据运营商 (cdo) 在一条线上进行:

cdo sellonlatbox,-96.5,-89,40,43 in.nc out.nc

Thus to loop over a set of file, I would do this in a BASH script, using cdo to process each file and then calling your python script:因此,要遍历一组文件,我会在 BASH 脚本中执行此操作,使用 cdo 处理每个文件,然后调用您的 python 脚本:


# pick up a list of files (I'm presuming the loop is over the years)
files=`ls /usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc`

for file in $files ; do 
   # extract the location, I haven't used your exact lat/lons
   cdo sellonlatbox,-96.5,-89,40,43 $file iowa.nc

   # Call your python or R script here to process file iowa.nc
   python script

I always try and do my file processing "offline" as I find it less prone to error.我总是尝试“离线”处理我的文件,因为我发现它不太容易出错。 cdo is an alternative to ncks, I'm not saying it is better, I just find it easier to remember the commands. cdo 是 ncks 的替代品,我并不是说它更好,我只是发现它更容易记住命令。 nco in general is more powerful and I resort to it when cdo can't perform the task I wish to carry out. nco 通常更强大,当 cdo 无法执行我希望执行的任务时,我会使用它。

Small change needs to be made to the lonbounds part (data are degrees east), because the longitude value ranges from 0 to 359 in the data, so negative numbers will not work in this case. lonbounds 部分需要做些小改动(数据为东经度),因为数据中的经度值范围为 0 到 359,因此在这种情况下负数不起作用。 Also the calculation for latli and latui needs to be switched because the value goes from north to south, 89 to -89.此外,latli 和 latui 的计算需要切换,因为值从北到南,89 到 -89。

latbounds = [ 40 , 43 ]
lonbounds = [ 260 , 270 ] # degrees east
lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]

# latitude lower and upper index
latli = np.argmin( np.abs( lats - latbounds[1] ) )
latui = np.argmin( np.abs( lats - latbounds[0] ) ) 

# longitude lower and upper index
lonli = np.argmin( np.abs( lons - lonbounds[0] ) )
lonui = np.argmin( np.abs( lons - lonbounds[1] ) )  

If you are working in Linux or macOS this can be handled very easily using nctoolkit ( https://nctoolkit.readthedocs.io/en/latest/ ):如果您在 Linux 或 macOS 中工作,这可以使用 nctoolkit ( https://nctoolkit.readthedocs.io/en/latest/ ) 轻松处理:

import nctoolkit as nc
data = nc.open_data('/usgs/data2/rsignell/models/ncep/narr/air.2m.1989.nc')
data.crop(lon = [-(96+31/60), -(89+5/6)], lat = [40 + 36/60, 43 + 30/60])

