简体   繁体   English

R的不同层和几个netcdf文件的平均值

[英]Average of different layer and several netcdf files with R

I have 15 netCDF files (.nc) for each year from 2000 to 2014. In one nc file, I have hourly data of one variable in 8760 layers. 从2000年到2014年,我每年有15个netCDF文件(.nc)。在一个nc文件中,我有8760层的一个变量的每小时数据。 The 3 dimensions are: time (size 8760), latitude (size 90) and longitude (size 180) (2° resolution). 这3个维度分别是:时间(大小8760),纬度(大小90)和经度(大小180)(2°分辨率)。

I want to compute the average of my variable between 8am and 7pm from april to september and for the period 2000-2014. 我想计算2000年至2014年4月至9月的上午8点至晚上7点之间的变量平均值。

For one .nc file, this correspond to the average between 对于一个.nc文件,这对应于

  • layer time from 2169 (ie 01/04/2000 8am) to 2180 (ie 01/04/2000 7pm) (to i=2169 to i+11), 层时间从2169(即2000年4月1日上午8点)到2180(即2000年1月4日下午7点)(到i = 2169到i + 11),
  • then from 2193 (ie 02/04/2000 8am) to 2204 (ie 02/04/2000 7pm) (i+22, i+33) 然后从2193(即2000年4月2日上午8点)到2204(即2000年4月2日晚上7点)(i + 22,i + 33)
  • etc.... 等等....
  • ... and from 6537 (ie 30/09/2000 8am) to 6548 (ie 30/09/2000 7pm) ...从6537(即2000年9月30日上午8点)到6548(即2000年9月30日晚上7点)
  • And then the average of all nc. 然后是所有nc的平均值。 files. 文件。

The result should be presented in one .nc file of 3 dimensions : - time (only one value as average), - latitude (size 90) and - longitude (size 180) (2° resolution) 结果应显示在一个3维的.nc文件中:-时间(平均值仅一个值),-纬度(尺寸90)和-经度(尺寸180)(2°分辨率)

then I can draw the map of the variable averaged over 2000-2014 (Apr to Sept, from 8am to 7pm). 然后我可以绘制2000-2014年(4月至9月,上午8点至晚上7点)平均变量的地图。 I am able to read each nc file, do a map for each hour ofeach nc file, but I have know idea of how to make the mean required. 我能够读取每个nc文件,为每个nc文件的每个小时做一个映射,但是我知道如何计算所需的均值。 If anybody can help me, that would be great. 如果有人可以帮助我,那就太好了。

name of my variable : dname <- "sfvmro3" 我的变量的名称:dname <-“ sfvmro3”

Here is my code as a fist reading: 这是我的第一篇代码:

ncin <- nc_open("sfvmro3_hourly_2000.nc")
print(ncin)

lon <- ncvar_get(ncin, "lon")
lon[lon > 180] <- lon[lon > 180] - 360
nlon <- dim(lon)
head(lon)

lat <- ncvar_get(ncin, "lat", verbose = F)
nlat <- dim(lat)
head(lat)

print(c(nlon, nlat))

t <- ncvar_get(ncin, "time")
tunits <- ncatt_get(ncin, "time", "units")
nt <- dim(t)

dname <- "sfvmro3"
var.array <- ncvar_get(ncin, dname)*10^9  # from mol.mol-1 to ppb
dlname <- ncatt_get(ncin, dname, "long_name")
dunits <- ncatt_get(ncin, dname, "units")
fillvalue <- ncatt_get(ncin, dname, "_FillValue")
var.array[var.array == fillvalue$value] <- NA
dim(var.array)

tustr <- strsplit(tunits$value, " ")
tdstr <- strsplit(unlist(tustr)[3], "-")
tyear = as.integer(unlist(tdstr)[1])
tmonth = as.integer(unlist(tdstr)[2])
tday = as.integer(unlist(tdstr)[3])
chron = chron(t, origin = c(tmonth, tday, tyear))

Here are the details on one of the yearly file.nc: 以下是有关逐年file.nc的详细信息:

 4 variables (excluding dimension variables):
    double time_bnds[bnds,time]   
    double lat_bnds[bnds,lat]   
    double lon_bnds[bnds,lon]   
    float sfvmro3[lon,lat,time]   
        standard_name: mole_fraction_of_ozone_in_air
        long_name: Ozone Volume Mixing Ratio in the Lowest Model Layer
        units: mole mole-1
        original_name: O_x
        original_units: 1
        history: 2016-04-22T05:20:31Z altered by CMOR: Converted units from '1' to 'mole mole-1'.
        cell_methods: time: point (interval: 30 minutes)
        cell_measures: area: areacella
        missing_value: 1.00000002004088e+20
        _FillValue: 1.00000002004088e+20
        associated_files: ...

 4 dimensions:
    time  Size:8760   *** is unlimited ***
        bounds: time_bnds
        units: days since 1850-01-01
        calendar: noleap
        axis: T
        long_name: time
        standard_name: time
    lat  Size:90
        bounds: lat_bnds
        units: degrees_north
        axis: Y
        long_name: latitude
        standard_name: latitude
    lon  Size:180
        bounds: lon_bnds
        units: degrees_east
        axis: X
        long_name: longitude
        standard_name: longitude
    bnds  Size:2

26 global attributes:
    institution: aaaa
    institute_id: aaaa
    experiment_id: aaaa
    source: aaaa
    model_id: aaaa
    forcing: HG, SA, S
    parent_experiment_id: N/A
    parent_experiment_rip: N/A
    branch_time: 0
    contact: aaa
    history: aaa
    initialization_method: 1
    physics_version: 1
    tracking_id: aaa
    product: output
    experiment: aaa
    frequency: hr
    creation_date: 2016-04-22T05:20:31Z
    Conventions: aaa
    project_id: aaa
    table_id:aaa
    title: aaaa
    parent_experiment: N/A
    modeling_realm: aaa
    realization: 1
    cmor_version: 2.7.1

I know two diferent possible solutions for your problem. 我知道您的问题有两种不同的解决方案。 One is base on taking the average for each .nc file and then take a weight average of that, the other is to get a really large array and average using that array. 一种是基于每个.nc文件的平均值,然后取其加权平均值,另一种则是获得一个非常大的数组并使用该数组取平均值。

  • First possible solution 第一个可能的解决方案

Each .nc that you read will give you and array, array1, array2 and so on. 您阅读的每个.nc都将为您提供array和array1,array2等。 Also for each array you will have a time series associated to one dimension of the array. 同样,对于每个数组,您将有一个与数组的一个维度相关联的时间序列。 This meaning that time_serie1 has all the different times in POSIXct format for array1. 这意味着time_serie1具有array1的POSIXct格式的所有不同时间。 So first you have to build in that vector. 因此,首先您必须构建该向量。 One you have that you can get a vector index of the times you want to use for average. 您可以获得一个要用于平均的时间的向量索引。 For this I would use lubridate package but it is not necessary. 为此,我将使用lubridate软件包,但这不是必需的。

index1 <- month(time_serie1) < 10 & month(time_serie1) > 3 # this make an index from april to septembre
index1 <- index1 & hour(time_serie1) <= 19 & hour(time_serie1) >= 8 # then you add the hour restriction
mean1 <- apply(array1[,,index1],1:2,mean)

This code will give you a 2D array with the mean for the first year, you can put your arrays and time_series into list and loop it. 此代码将为您提供一个二维数组,其中包含第一年的平均值,您可以将数组和time_series放入列表中并循环播放。 Then you will have for each year a 2d array of the mean for that year and you can average this arrays. 然后,您将每年获得该年平均值的二维数组,并且可以对该数组取平均值。 The part of "weight" average that I said is because if you do this and in your average you include February your's means will have be done taking different amount of days, for your example it is not necesary, but if you use February then you have to weight the amount of data used for each mean value. 我说的“加权”平均数部分是因为,如果您这样做,并且在平均数中包括2月,则您的均值将花费不同的天数(例如,您不必这样做),但是如果使用2月,则表示必须权衡用于每个平均值的数据量。

  • Second possible solution 第二种可能的解决方案

For this solution is almost the same than the other one, but I like it more. 这个解决方案几乎与另一个解决方案相同,但我更喜欢它。 You can merge all your arrays into a big array doing it in order so the time index is in increasing order, I will call this array BigArray. 您可以按顺序将所有数组合并到一个大数组中,这样时间索引的顺序将递增,我将此数组称为BigArray。 Then merge the Time series associated with each array, I will call it BigTime. 然后合并与每个数组关联的时间序列,我将其称为BigTime。 And the look for the indexes you want to average and it is done. 然后寻找您想要平均的索引,并完成。 The big advantage is that you don't have to make a loop with the data in a list, and that you don't have to care about February changing size. 最大的优点是您不必循环使用列表中的数据,也不必担心2月份的大小更改。

Index <- month(BigTime) < 10 & month(BigTime) > 3 # this make an index from april to septembre
Index <- Index & hour(BigTime) <= 19 & hour(BigTime) >= 8 # then you add the hour restriction
Mean <- apply(BigArray[,,Index],1:2,mean) 

And then it is done the mean for your values. 然后,这就是您的价值均值。

In both possibles a 2d array is build, if you want a 3d array with one dimension (time) having only one value chase add that dimension. 在两种可能的情况下,都将构建一个2d数组,如果您想要一个只有一个值追逐的一维(时间)的3d数组,请添加该维。 And if you want to look for more information taking mean of specific time values is normally call composite technique in Meteorology Science. 而且,如果您想查找更多信息,则可以将特定时间值作为平均值,这在气象科学中通常称为复合技术。

I hope this solve your problem. 我希望这能解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM