简体   繁体   English

嵌套用于R中的NetCDF的循环

[英]Nested for loops with NetCDFs in R

I'm very very thankful to everybody who is wanting to help me with a problem I really got stuck with. 我非常感谢所有想帮助我解决我确实遇到的问题的人。 But in advance: it is a comlex topic and I try my best to explain what I'm intending to do with my code. 但提前:这是一个复杂的话题,我会尽力解释我打算对我的代码做些什么。 It's about climate data in NetCDF files, that contain monthly temperature (tas) and precipitation (pr) data for the time periods 1971 to 2000 and 2071 to 2100. The nc-files of the historical period contains approx. 它与NetCDF文件中的气候数据有关,其中包含1971年至2000年以及2071年至2100年时间的月度温度(tas)和降水量(pr)数据。 440x400 grid points (Map of Europe). 440x400网格点(欧洲地图)。 The nc-files of the future period contain 1x1 grid point (for a City of interest). 未来期间的nc文件包含1x1网格点(对于感兴趣的城市)。 Each grid point has 360 temperature or precipitation values (depending on the model), one value for each month of the 30 year periods. 每个网格点具有360个温度或降水值(取决于模型),在30年期间的每个月都有一个值。 In other words: each grid point has a distribution of 360 points. 换句话说:每个网格点都有360个点的分布。 Now, I want to iteratively calculate the statistical difference between the distribution of the single city grid point (2071-2100) with each Europe (1971-2000) grid point's distribution. 现在,我想迭代计算单个城市网格点(2071-2100)的分布与每个欧洲(1971-2000)网格点的分布之间的统计差异。 I shall obtain one averaged absolute distance per Europe grid point. 我将获得每个欧洲网格点的平均平均距离。 The idea is to find in the European grid raster the grid point whose temp or precipitation distribution is the most similar to the distribution of the city of interest in the future. 这个想法是在欧洲网格栅格中找到其温度或降水分布与将来感兴趣的城市的分布最相似的网格点。 I must conduct that calculation for 30 different climate models. 我必须针对30种不同的气候模型进行计算。

# List filenames of the directory

hist.files <- list.files("/historical", full.names = TRUE)
rcp.files <- list.files("/rcp", full.names = TRUE)

#Create array for desired ‘similarity indices’. One matrix per climate model run.

sim.array <- array(NA, dim = c(440,400,30))

#Looping through the models of the period 1971-2000. Some containing precipitation data others temperature (see if…else) 

for(k in 1:length(hist.files))   {
        hist.data <- nc_open(hist.files[k])   

   if(grepl("pr", hist.data$filename)){
    hist.tas <- ncvar_get(hist.data, "pr")
        }else{
    hist.tas <- ncvar_get(hist.data, "tas") 
    hist.tas <- kelvin.to.celsius(hist.tas, round=2)
   }

#Looping through the models of the 2071 to 2100 period (city). Some containing precipitation data others temperature (see if…else)

for(r in 1:length(rcp.files)) {
    rcp.data <- nc_open(rcp.files[r])
    if(grepl("pr", rcp.data$filename)){
    rcp.tas <- ncvar_get(rcp.data, "pr") 
        }else{
    rcp.tas <- ncvar_get(rcp.data, "tas")
    rcp.tas <- kelvin.to.celsius(rcp.tas, round=2)
        }

#This if statement because hist contains more models than rcp and I want to exclusively use the models contained in both of them.  

if(hist.data %in% rcp.data) {  

#Looping through the grid points of ‘hist’ model k. Lastly the function that calculates for each grid point of the model a difference value (always to the one grid point of ‘rcp’). My idea of the break statement was to loop nrow and ncol the same times, but I’m not sure if break does what I intended to.       

for(i in 1:nrow(hist.tas)) { 
       for(j in 1:ncol(hist.tas)) {
    sim.array[i,j,k] <- abs(sum(rcp.tas - hist.tas[i,j,])/360)
break
    }
  print(sim.array[i,j,k])
  }
 }
}   
}
sim.array[1,1,1]

Well, I obtain an array full of NAs. 好吧,我得到了一个充满NA的数组。 There is no Error message, but something is going wrong! 没有错误消息,但是出了点问题! Someone who can find an error? 有人可以找到错误吗? I appreciate any help. 感谢您的帮助。 Thank you a lot in advance! 提前非常感谢您!

Update: Your suggestions seem to be a sound solution! 更新:您的建议似乎是一个好的解决方案! Until now I hadn't the time to apply them, but I will do later! 到现在为止,我还没有时间应用它们,但是我稍后再做! I have been thinking about vectorization, but did not manage to make vectors out of 3 dimensional arrays without having a messy code full of different vectors in the end...I neither knew how to remove the models that do not match hist and rcp. 我一直在考虑矢量化,但是没有设法从3维数组中制作出矢量,而最后没有一个充满不同矢量的混乱代码……我都不知道如何删除与hist和rcp不匹配的模型。 With intersect and %in% I knew the index of the not matching files...but there must be a better way than noting by hand all these indexes for deletion, isn't? 通过相交和%in%,我知道了不匹配文件的索引...但是,必须有比手动记录所有这些索引来删除更好的方法,不是吗? Please have a look at some of the hist file names: 请查看一些历史文件名:

> hist.files.tas <- list.files("/historical", full.names = TRUE, pattern = "tas")
> hist.files.tas
 [1] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc"   
 [2] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc"       
 [3] "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc"  
 [4] "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc"           
 [5] "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc"         
 [6] "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc"                
 [7] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc"             
 [8] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc"              
 [9] "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc"                 
[10] "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc"          
[11] "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc"               
[12] "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc"         
[13] "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc"             
[14] "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc"   

There are more models with variables tasmax and tasmin. 还有更多具有tasmax和tasmin变量的模型。 In total hist has 71 files and rcp only 30. Could you give me an example of how to write an automated code to delete the hist files that do not match? hist总共有71个文件,而rcp只有30个。您能给我一个例子,说明如何编写一个自动代码删除不匹配的hist文件吗? Thank you a lot! 非常感谢!

It seems to me that the below makes no sense, and is always FALSE: 在我看来,以下内容毫无意义,并且始终为假:

if (hist.data %in% rcp.data)

So nothing happens with sim_array 所以sim_array什么也没有发生

I would start by doing something like this: 我将从做这样的事情开始:

hist.files.pr <- list.files("/historical", full.names = TRUE, pattern="pr")
hist.files.tas <- list.files("/historical", full.names = TRUE, pattern="tas")
rcp.files.pr <- list.files("/rcp", full.names = TRUE, pattern="pr")
rcp.files.tas <- list.files("/rcp", full.names = TRUE, pattern="tas")

At this point you can remove the files from "hist" for models that are not in "rcp" 此时,对于不在“ rcp”中的模型,您可以从“历史”中删除文件

hist.files.tas <- c( "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc")

# in this example, fut files is a subset of hist files; that should be OK if their filename structure is the same

rcp.files.tas <- hist.files.tas[1:7]

getModels <- function(ff) {
    base <- basename(ff)
    s <- strsplit(base, "_")
    sapply(s, function(i) i[[2]])
}

getHistModels <- function(hist, fut) {
    h <- getModels(hist)
    uh <- unique(h)
    uf <- unique(getModels(fut))
    uhf <- uh[uh %in% uf]
    hist[h %in% uhf]
}


hist.files.tas.selected <- getHistModels(hist.files.tas, rcp.files.tas)
# hist.files.pr.selected <- getHistModels(hist.files.pr, rcp.files.pr)

The double loop (k, r) could probably be avoided by doing something like this: 可以通过执行以下操作避免双循环(k,r):

library(raster)
his.pr <- values(stack(hist.files.pr.selected, var="pr")))
his.tas <- values(stack(hist.files.tas.selected, var="tas"))
rcp.pr <- values(stack(hist.files.pr, var="pr"))
rcp.tas <- values(stack(hist.files.tas, var="tas"))

And the (i, j) loop over the rows and cols can probably be avoided too. 并且也可以避免在行和列上的(i,j)循环。 R is vectorized. R被向量化。 That is, you can do things like (1:10) - 2 . 也就是说,您可以执行(1:10) - 2

Either way, your code is very hard to read with all these nested loops. 无论哪种方式,使用所有这些嵌套循环都很难阅读代码。 If you actually need them, it would be better to call functions. 如果您确实需要它们,则最好调用函数。 For more help, provide some example data instead of files that we do not have, or make a few files available. 要获得更多帮助,请提供一些示例数据而不是我们没有的文件,或者提供一些文件。

As there actually are two more variables "tasmax" and "tasmin" besides "tas" and "pr" in my dataset Robert's approach would have been to much writing for my case. 因为在我的数据集中,除了“ tas”和“ pr”之外,实际上还有另外两个变量“ tasmax”和“ tasmin”,Robert的方法对我的案例来说要写得多。 Thus, I tried another way, that finally worked out, although it doesn't list the files of each variable separately (a disadvantage, yes!). 因此,我尝试了另一种方法,终于解决了,尽管它没有单独列出每个变量的文件(缺点是,是的!)。

List and match files of historical and rcp: 列出和匹配历史文件和rcp文件:

To match the files I need the pure names of the files without directory, otherwise which(!hist %in% rcp) will always be FALSE (as shown by Robert). 要匹配文件,我需要不带目录的文件的纯名称,否则,((hist%in%rcp)始终为FALSE(如Robert所示)。

hist <- list.files("/historical") rcp <- list.files("/rcp26") hist <-list.files(“ / historical”)rcp <-list.files(“ / rcp26”)

no.match.h <- which(!hist %in% rcp) no.match.r <- which(!rcp %in% hist) no.match.h <-which(!hist%in%rcp)no.match.r <-which(!rcp%in%hist)

As I need for nc_open the filename including directory I must create an according file list and subtract the non-matching files 因为我需要nc_open,包括目录的文件名,所以我必须创建一个相应的文件列表并减去不匹配的文件

hist.files <- list.files("/data/scratch/lorchdav/cordex_eur/monmean/historical", full.names = TRUE) rcp.files <- list.files("/data/scratch/lorchdav/cordex_ber_mean/rcp26", full.names = TRUE) hist.files <-list.files(“ / data / scratch / lorchdav / cordex_eur / monmean / historical”,full.names = TRUE)rcp.files <-list.files(“ / data / scratch / lorchdav / cordex_ber_mean / rcp26 “,full.names = TRUE)

hist.files.cl <- hist.files[-no.match.h] hist.files.cl hist.files.cl <-hist.files [-no.match.h] hist.files.cl

rcp.files.cl <- rcp.files[-no.match.r] rcp.files.cl rcp.files.cl <-rcp.files [-no.match.r] rcp.files.cl

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM