簡體   English   中英

Python讀取大小可變的多個NetCDF Rainfall文件

[英]Python Reading Multiple NetCDF Rainfall files of variable size

我遇到的問題是,澳大利亞氣象局向我提供了降雨數據文件,其中包含所有活動儀表每30分鍾記錄一次的降雨記錄。 問題是1天有48個30Minute文件。 我想創建一個特定量規的時間序列。 這意味着讀取所有48個文件並搜索Gauge ID,以確保如果在1 30分鍾內儀表沒有記錄任何東西,它不會失敗? 這是文件格式的鏈接:

https://dl.dropboxusercontent.com/u/15223371/14/gauge_30min_20100214_000000.nc

https://dl.dropboxusercontent.com/u/15223371/14/gauge_30min_20100214_003000.nc

https://dl.dropboxusercontent.com/u/15223371/14/gauge_30min_20100214_010000.nc

到目前為止,這是我嘗試過的:

"""
This script is used to read a directory of raingauge data from a Data Directory





"""
from anuga.file.netcdf import NetCDFFile
from anuga.config import netcdf_mode_r, netcdf_mode_w, netcdf_mode_a, \
                            netcdf_float
import os
import glob
from easygui import *
import string
import numpy
"""
print 'Default file Extension...'
msg="Enter 3 letter extension."
title = "Enter the 3 letter file extension to search for in DIR "
default = "csv"
file_extension = enterbox(msg,title,default)
"""


print 'Present Directory Open...'
title = "Select Directory to Read Multiple rainfall .nc files"
msg = "This is a test of the diropenbox.\n\nPick the directory that you wish to open."
d = diropenbox(msg, title)
fromdir = d

filtered_list = glob.glob(os.path.join(fromdir, '*.nc'))
filtered_list.sort()

nf = len(filtered_list)
print nf

import numpy

rain = numpy.zeros(nf,'float')
t = numpy.arange(nf)

Stn_Loc_File='Station_Location.csv'
outfid = open(Stn_Loc_File, 'w')

prec = numpy.zeros((nf,1752),numpy.float)

gauge_id_list = ['570002','570021','570025','570028','570030','570032','570031','570035','570036',
                 '570047','570772','570781','570910','570903','570916','570931','570943','570965',
                 '570968','570983','570986','70214','70217','70349','70351']
"""
title = "Select Gauge to plot"
msg = "Select Gauge"
gauge_id = int(choicebox(msg=msg,title=title, choices=gauge_id_list))
"""
#for gauge_id in gauge_id_list:
#    gauge_id = int(gauge_id)
try:    

    for i, infile in enumerate(filtered_list):

        infilenet = NetCDFFile(infile, netcdf_mode_r)
        print infilenet.variables
        raw_input('Hold.... check variables...')
        stn_lats = infilenet.variables['latitude']
        stn_longs = infilenet.variables['longitude']
        stn_ids = infilenet.variables['station_id']
        stn_rain = infilenet.variables['precipitation']

        print stn_ids.shape
        #print stn_lats.shape
        #print stn_longs.shape
        #print infile.dimensions
        stn_ids = numpy.array(stn_ids)

        l_id = numpy.where(stn_ids == gauge_id)
        if stn_ids in gauge_id_list:
            try:
                l_id = l_id[0][0]
                rain[i] = stn_rain[l_id]
            except:
                rain[i] = numpy.nan
    print 'End for i...'            
    #print rain

    import pylab as pl

    pl.bar(t,rain)
    pl.title('Rain Gauge data')
    pl.xlabel('time steps')
    pl.ylabel('rainfall (mm)')
    pl.show()
except:
    pass 
raw_input('END....')

好的,您得到的數據格式比實際需要的要復雜得多。 他們很容易將一整天都塞入netCDF文件中。 確實,解決此問題的一種選擇是使用例如NCO命令行工具將所有文件合並為一個具有時間維度的文件。

但是這里是使用scipy netcdf模塊的解決方案。 我認為它已過時-我本人更喜歡NetCDF4庫。 主要方法是:使用np.nan值預設輸出數據結構; 遍歷您的輸入文件並檢索降水和站號; 對於您感興趣的每個Stationid,檢索索引,然后在該索引處進行降水; 添加到輸出結構。 (我沒有做提取時間戳的工作,這取決於您。)

import glob
import numpy as np
from scipy.io import netcdf

# load data file names 
stationdata = glob.glob('gauge*.nc')
stationdata.sort()
# initialize np arrays of integer gauging station ids
gauge_id_list = ['570002','570021','570025','570028','570030','570032','570031','570035','570036',
                 '570047','570772','570781','570910','570903','570916','570931','570943','570965',
                 '570968','570983','570986','70214','70217','70349','70351']
gauge_ids = np.array(gauge_id_list).astype('int32')
ngauges = len(gauge_ids)
ntimesteps = 48
# initialize output dictionary
dtypes = zip(gauge_id_list, ['float32']*ngauges)
timeseries_per_station = np.empty((ntimesteps,))
timeseries_per_station.fill(np.nan)
timeseries_per_station = timeseries_per_station.astype(dtypes)

# Instead of using the index, you could extract the datetime stamp 
for timestep, datafile in enumerate(stationdata):
    data = netcdf.NetCDFFile(datafile, 'r')
    precip = data.variables['precip'].data
    stid = data.variables['stid'].data
    # create np array of indices of the gaugeid present in file
    idx = np.where(np.in1d(stid, gauge_ids))[0]
    for i in idx:
        timeseries_per_station[str(stid[i])][timestep] = precip[i]
    data.close()

np.set_printoptions(precision=1)
for gauge_id in gauge_id_list:
    print "Station %s:" % gauge_id
    print timeseries_per_station[gauge_id]

輸出看起來像這樣:

Station 570002:
[ 1.9  0.3  0.   nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan]
Station 570021:
[  0.   0.   0.  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan]
...

(顯然,只有三個文件。)

編輯: OP指出代碼對他來說並非沒有錯誤,因為他的變量名是“ precipitation”和“ station_id”。 代碼在他發布的文件上為我運行。 顯然,他應該使用提供的文件中使用的任何變量名。 因為它們似乎是供他使用的定制文件,所以可以想象作者在變量命名方面可能不一致。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM