简体   繁体   English

循环通过netcdf文件并运行计算 - Python或R.

[英]Loop through netcdf files and run calculations - Python or R

This is my first time using netCDF and I'm trying to wrap my head around working with it. 这是我第一次使用netCDF,我正试着用它来解决这个问题。

I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). 我有多个版本3 netcdf文件(NOAA NARR air.2m每日平均一年)。 Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. 每个文件跨越1979年至2012年的一年。它们是349 x 277网格,分辨率约为32千米。 Data was downloaded from here . 数据是从这里下载的。

The dimension is time (hours since 1/1/1800) and my variable of interest is air. 维度是时间(自1800年1月1日以来的小时数),我感兴趣的变量是空气。 I need to calculate accumulated days with a temperature < 0. For example 我需要计算温度<0的累计天数。例如

    Day 1 = +4 degrees, accumulated days = 0
    Day 2 = -1 degrees, accumulated days = 1
    Day 3 = -2 degrees, accumulated days = 2
    Day 4 = -4 degrees, accumulated days = 3
    Day 5 = +2 degrees, accumulated days = 0
    Day 6 = -3 degrees, accumulated days = 1

I need to store this data in a new netcdf file. 我需要将这些数据存储在一个新的netcdf文件中。 I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for. 我熟悉Python并且熟悉R.什么是循环每天的最佳方法,检查前一天的值,并在此基础上,将值输出到具有完全相同的维度和变量的新netcdf文件...或者可能只是使用我正在寻找的输出将另一个变量添加到原始netcdf文件中。

Is it best to leave all the files separate or combine them? 最好将所有文件分开或组合起来吗? I combined them with ncrcat and it worked fine, but the file is 2.3gb. 我将它们与ncrcat结合使用它工作正常,但文件是2.3gb。

Thanks for the input. 感谢您的投入。

My current progress in python: 我目前在python中取得的进展:

import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
  print(a)

#output =
     lat
     long
     x
     y
     Lambert_Conformal
     time
     time_bnds
     air

f.variables['air'][1, 1, 1]
#Output
     298.37473

To help me understand this better what type of data structure am I working with? 为了帮助我更好地理解我使用的数据结构类型是什么? Is ['air'] the key in the above example and [1,1,1] are also keys? ['air']是上例中的关键,[1,1,1]也是键吗? to get the value of 298.37473. 得到298.37473的价值。 How can I then loop through [1,1,1]? 我怎么能循环通过[1,1,1]?

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat . 您可以使用netCDF4中非常好的MFDataset功能将一堆文件视为一个聚合文件,而无需使用ncrcat So you code would look like this: 所以你的代码看起来像这样:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

生成冷天的形象

And here's one way to write the file (there might be easier ways): 这是编写文件的一种方法(可能有更简单的方法):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI , it seems to be okay: 如果我尝试在Unidata NetCDF-Java Tools-UI GUI中查看生成的文件,它似乎没问题: 在此输入图像描述 Also note that here I just downloaded two of the datasets for testing, so I used 另请注意,这里我刚下载了两个用于测试的数据集,所以我使用了

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. 举个例子。 For all the data, you could use 对于所有数据,您可以使用

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or 要么

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

Here is an R solution. 这是一个R解决方案。

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

在此输入图像描述

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. 我知道从2013年开始这个帖子的时间已经相当晚了,但我只是想指出,所接受的解决方案并没有为提出的确切问题提供解决方案。 The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (eg for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. 这个问题似乎要求每个连续温度低于零的时间长度(如果温度超过零,则计数器重置的问题),这对气候应用(例如农业)很重要,而接受的解决方案只给出总数温度低于零的一年中的天数。 If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output: 如果这真的是mkmitchell想要的(它已被接受为答案)那么它可以从cdo中的命令行完成,而不必担心NETCDF输入/输出:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be: 所以一个循环的脚本将是:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done 

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller: 如果您想要整个期间的总数,那么您可以使用_numdays文件而不是更小:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc 

But again, the question seems to want accumulated days per event , which is different, but not provided by the accepted answer. 但同样,这个问题似乎需要每个事件累积的天 ,这是不同的,但不是由接受的答案提供。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM