Nested for loops with NetCDFs in R

Question

I'm very very thankful to everybody who is wanting to help me with a problem I really got stuck with. But in advance: it is a comlex topic and I try my best to explain what I'm intending to do with my code. It's about climate data in NetCDF files, that contain monthly temperature (tas) and precipitation (pr) data for the time periods 1971 to 2000 and 2071 to 2100. The nc-files of the historical period contains approx. 440x400 grid points (Map of Europe). The nc-files of the future period contain 1x1 grid point (for a City of interest). Each grid point has 360 temperature or precipitation values (depending on the model), one value for each month of the 30 year periods. In other words: each grid point has a distribution of 360 points. Now, I want to iteratively calculate the statistical difference between the distribution of the single city grid point (2071-2100) with each Europe (1971-2000) grid point's distribution. I shall obtain one averaged absolute distance per Europe grid point. The idea is to find in the European grid raster the grid point whose temp or precipitation distribution is the most similar to the distribution of the city of interest in the future. I must conduct that calculation for 30 different climate models.

# List filenames of the directory

hist.files <- list.files("/historical", full.names = TRUE)
rcp.files <- list.files("/rcp", full.names = TRUE)

#Create array for desired ‘similarity indices’. One matrix per climate model run.

sim.array <- array(NA, dim = c(440,400,30))

#Looping through the models of the period 1971-2000. Some containing precipitation data others temperature (see if…else) 

for(k in 1:length(hist.files))   {
        hist.data <- nc_open(hist.files[k])   

   if(grepl("pr", hist.data$filename)){
    hist.tas <- ncvar_get(hist.data, "pr")
        }else{
    hist.tas <- ncvar_get(hist.data, "tas") 
    hist.tas <- kelvin.to.celsius(hist.tas, round=2)
   }

#Looping through the models of the 2071 to 2100 period (city). Some containing precipitation data others temperature (see if…else)

for(r in 1:length(rcp.files)) {
    rcp.data <- nc_open(rcp.files[r])
    if(grepl("pr", rcp.data$filename)){
    rcp.tas <- ncvar_get(rcp.data, "pr") 
        }else{
    rcp.tas <- ncvar_get(rcp.data, "tas")
    rcp.tas <- kelvin.to.celsius(rcp.tas, round=2)
        }

#This if statement because hist contains more models than rcp and I want to exclusively use the models contained in both of them.  

if(hist.data %in% rcp.data) {  

#Looping through the grid points of ‘hist’ model k. Lastly the function that calculates for each grid point of the model a difference value (always to the one grid point of ‘rcp’). My idea of the break statement was to loop nrow and ncol the same times, but I’m not sure if break does what I intended to.       

for(i in 1:nrow(hist.tas)) { 
       for(j in 1:ncol(hist.tas)) {
    sim.array[i,j,k] <- abs(sum(rcp.tas - hist.tas[i,j,])/360)
break
    }
  print(sim.array[i,j,k])
  }
 }
}   
}
sim.array[1,1,1]

Well, I obtain an array full of NAs. There is no Error message, but something is going wrong! Someone who can find an error? I appreciate any help. Thank you a lot in advance!

Update: Your suggestions seem to be a sound solution! Until now I hadn't the time to apply them, but I will do later! I have been thinking about vectorization, but did not manage to make vectors out of 3 dimensional arrays without having a messy code full of different vectors in the end...I neither knew how to remove the models that do not match hist and rcp. With intersect and %in% I knew the index of the not matching files...but there must be a better way than noting by hand all these indexes for deletion, isn't? Please have a look at some of the hist file names:

> hist.files.tas <- list.files("/historical", full.names = TRUE, pattern = "tas")
> hist.files.tas
 [1] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc"   
 [2] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc"       
 [3] "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc"  
 [4] "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc"           
 [5] "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc"         
 [6] "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc"                
 [7] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc"             
 [8] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc"              
 [9] "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc"                 
[10] "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc"          
[11] "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc"               
[12] "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc"         
[13] "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc"             
[14] "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc"

There are more models with variables tasmax and tasmin. In total hist has 71 files and rcp only 30. Could you give me an example of how to write an automated code to delete the hist files that do not match? Thank you a lot!

Answer 1

It seems to me that the below makes no sense, and is always FALSE:

if (hist.data %in% rcp.data)

So nothing happens with sim_array

I would start by doing something like this:

hist.files.pr <- list.files("/historical", full.names = TRUE, pattern="pr")
hist.files.tas <- list.files("/historical", full.names = TRUE, pattern="tas")
rcp.files.pr <- list.files("/rcp", full.names = TRUE, pattern="pr")
rcp.files.tas <- list.files("/rcp", full.names = TRUE, pattern="tas")

At this point you can remove the files from "hist" for models that are not in "rcp"

hist.files.tas <- c( "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc")

# in this example, fut files is a subset of hist files; that should be OK if their filename structure is the same

rcp.files.tas <- hist.files.tas[1:7]

getModels <- function(ff) {
    base <- basename(ff)
    s <- strsplit(base, "_")
    sapply(s, function(i) i[[2]])
}

getHistModels <- function(hist, fut) {
    h <- getModels(hist)
    uh <- unique(h)
    uf <- unique(getModels(fut))
    uhf <- uh[uh %in% uf]
    hist[h %in% uhf]
}


hist.files.tas.selected <- getHistModels(hist.files.tas, rcp.files.tas)
# hist.files.pr.selected <- getHistModels(hist.files.pr, rcp.files.pr)

The double loop (k, r) could probably be avoided by doing something like this:

library(raster)
his.pr <- values(stack(hist.files.pr.selected, var="pr")))
his.tas <- values(stack(hist.files.tas.selected, var="tas"))
rcp.pr <- values(stack(hist.files.pr, var="pr"))
rcp.tas <- values(stack(hist.files.tas, var="tas"))

And the (i, j) loop over the rows and cols can probably be avoided too. R is vectorized. That is, you can do things like (1:10) - 2 .

Either way, your code is very hard to read with all these nested loops. If you actually need them, it would be better to call functions. For more help, provide some example data instead of files that we do not have, or make a few files available.

Answer 2

As there actually are two more variables "tasmax" and "tasmin" besides "tas" and "pr" in my dataset Robert's approach would have been to much writing for my case. Thus, I tried another way, that finally worked out, although it doesn't list the files of each variable separately (a disadvantage, yes!).

List and match files of historical and rcp:

To match the files I need the pure names of the files without directory, otherwise which(!hist %in% rcp) will always be FALSE (as shown by Robert).

hist <- list.files("/historical") rcp <- list.files("/rcp26")

no.match.h <- which(!hist %in% rcp) no.match.r <- which(!rcp %in% hist)

As I need for nc_open the filename including directory I must create an according file list and subtract the non-matching files

hist.files <- list.files("/data/scratch/lorchdav/cordex_eur/monmean/historical", full.names = TRUE) rcp.files <- list.files("/data/scratch/lorchdav/cordex_ber_mean/rcp26", full.names = TRUE)

hist.files.cl <- hist.files[-no.match.h] hist.files.cl

rcp.files.cl <- rcp.files[-no.match.r] rcp.files.cl

Nested for loops with NetCDFs in R

Question

2 answers

solution1
1 2018-02-20 04:34:57

solution2
0 2018-03-02 11:11:36

Nested for loops with NetCDFs in R

Question

2 answers

solution1 1 2018-02-20 04:34:57

solution2 0 2018-03-02 11:11:36

solution1
1 2018-02-20 04:34:57

solution2
0 2018-03-02 11:11:36