简体   繁体   中英

Conditional simulation (with Kriging) in R with parallelization?

I am using gstat package in R to generate sequential gaussian simulations. My pc have 4 cores and I tried to parallelize the krige() function using the parallel package following the script provided by Guzmán to answer the question How to achieve parallel Kriging in R to speed up the process? .

The resulting simulations are, however, different from the ones using only one core at the time (no parallelization). It looks a geometry problem, but i can't find out how to fix it.

Next i will provide an example (using 4 cores) generating 2 simulations. You will see that after running the code, the simulated maps derived from parallelization show some artifacts (like vertical lines), and are different from the ones using only one core at the time.

The code needs the libraries gstat , sp , raster , parallel and spatstat . If any of the lines library() do not work, run install.packages() first.

library(gstat)
library(sp)
library(raster)
library(parallel)
library(spatstat)

# create a regular grid
nx=100 # number of columns
ny=100 # number of rows
srgr <- expand.grid(1:ny, nx:1)
names(srgr) <- c('x','y')
gridded(srgr)<-~x+y

# generate a spatial process (unconditional simulation)
g<-gstat(formula=z~x+y, locations=~x+y, dummy=T, beta=15, model=vgm(psill=3, range=10, nugget=0,model='Exp'), nmax=20)
sim <- predict(g, newdata=srgr, nsim=1)
r<-raster(sim)

# generate sample data (Poisson process)  
int<-0.02
rpp<-rpoispp(int,win=owin(c(0,nx),c(0,ny)))
df<-as.data.frame(rpp)
coordinates(df)<-~x+y 

# assign raster values to sample data
dfpp <-raster::extract(r,df,df=TRUE)
smp<-cbind(coordinates(df),dfpp)
smp<-smp[complete.cases(smp), ]
coordinates(smp)<-~x+y

# fit variogram to sample data
vs <- variogram(sim1~1, data=smp)
m <- fit.variogram(vs, vgm("Exp"))
plot(vs, model = m)

# generate 2 conditional simulations with one core processor
one <- krige(formula = sim1~1, locations = smp, newdata = srgr, model = m,nmax=12,nsim=2)

# plot simulation 1 and 2: statistics (min, max) are ok, simulations are also ok.
spplot(one["sim1"], main = "conditional simulation")
spplot(one["sim2"], main = "conditional simulation")

# generate 2 conditional with parallel processing
no_cores<-detectCores()
cl<-makeCluster(no_cores)
parts <- split(x = 1:length(srgr), f = 1:no_cores)
clusterExport(cl = cl, varlist = c("smp", "srgr", "parts","m"), envir = .GlobalEnv)
clusterEvalQ(cl = cl, expr = c(library('sp'), library('gstat')))
par <- parLapply(cl = cl, X = 1:no_cores, fun = function(x) krige(formula=sim1~1, locations=smp, model=m, newdata=srgr[parts[[x]],],  nmax=12, nsim=2))
stopCluster(cl)

# merge all parts    
mergep <- maptools::spRbind(par[[1]], par[[2]])
mergep <- maptools::spRbind(mergep, par[[3]])
mergep <- maptools::spRbind(mergep, par[[4]])

# create SpatialPixelsDataFrame from mergep
mergep <- SpatialPixelsDataFrame(points = mergep, data = mergep@data)

# plot mergep: statistics (min, max) are ok, but simulated maps show "vertical lines". i don't understand why.
spplot(mergep[1], main = "conditional simulation")
spplot(mergep[2], main = "conditional simulation")

I have tried your code and I think the problem lies with the way you split the work:

parts <- split(x = 1:length(srgr), f = 1:no_cores)

On my dual core machine that meant that all odd indices in srgr where handled by one process and all even indices where handled by the other process. This is probably the source of the vertical artifacts you are seeing.

A better way should be to split the data into consecutive chunks like this:

parts <- parallel::splitIndices(length(srgr), no_cores)

Using this splitting with the rest of your code I get results that look comparable to the sequential ones. At least to my untrained eyes ...


Original answer, which is only a minor effect. It still might make sense to fix the seed with set.seed for sequential and clusterSetRNGStream for parallel processing.

From what I have read about Kriging it requires you to draw random numbers. These random numbers will be different with parallel processing. See section 6 of the parallel vignette ( vignette("parallel") ) for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM