I have to smooth a large time series and I'm using the movingFun function from the 'raster' package. I tested few options based on previous posts (see my options below). The first 2 work, but are very slow when using the real data (all MOD13Q1 time series for all of Australia). So I attempted option 3 and failed. I'd appreciate if someone could help to point what's wrong in that function. I have access to memory, I'm using an RStudio Server that has 700GB ram, still, I'm not sure what'd be the best approach to do this job. Thanks in advance.
a) using movingFun and overlay
library(raster)
r <- raster(ncol=10, nrow=10)
r[] <- runif(ncell(r))
s <- brick(r,r*r,r+2,r^5,r*3,r*5)
ptm <- proc.time()
v <- overlay(s, fun=function(x) movingFun(x, fun=mean, n=3, na.rm=TRUE, circular=TRUE)) #works
proc.time() - ptm
user system elapsed
0.140 0.016 0.982
b) creating a function and using clusterR. I thought this would be faster than (a).
fun1 = function(x) {overlay(x, fun=function(x) movingFun(x, fun=mean, n=6, na.rm=TRUE, circular=TRUE))}
beginCluster(4)
ptm <- proc.time()
v = clusterR(s, fun1, progress = "text")
proc.time() - ptm
endCluster()
user system elapsed
0.124 0.012 4.069
c) I found this document written by Robert J. Hijmans and I tried (and failed) to write a function as described in the vignettes. I can't fully follow all the steps in that function, that's why is failing.
smooth.fun <- function(x, filename='', smooth_n ='',...) { #x could be a stack or list of rasters
out <- brick(x)
big <- ! canProcessInMemory(out, 3)
filename <- trim(filename)
if (big & filename == '') {
filename <- rasterTmpFile()
}
if (filename != '') {
out <- writeStart(out, filename, ...)
todisk <- TRUE
} else {
vv <- matrix(ncol=nrow(out), nrow=ncol(out))
todisk <- FALSE
}
bs <- blockSize(out)
pb <- pbCreate(bs$n)
if (todisk) {
for (i in 1:bs$n) {
v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
out <- writeValues(out, v, bs$row[i])
pbStep(pb, i)
}
out <- writeStop(out)
} else {
for (i in 1:bs$n) {
v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
cols <- bs$row[i]:(bs$row[i]+bs$nrows[i]-1)
vv[,cols] <- matrix(v, nrow=out@ncols)
pbStep(pb, i)
}
out <- setValues(out, as.vector(vv))
}
pbClose(pb)
return(out)
}
s <- smooth.fun(s, filename='test.tif', smooth_n = 6, format='GTiff', overwrite=TRUE)
Error in .local(.Object, ...) :
`/path-to-dir/test.tif' does not exist in the file system,
and is not recognised as a supported dataset name.
This is the solution I found, thanks to my colleague. It computes each year (of 23 files) in 20 minutes. There may be things to improve, but at this stage, I'm happy I can do the job in only 20 min per year.
So here I run 5 years simultaneously using foreach
package. Then the for
loop creates an array with 6 files at the time; remember that I needed a 3-months-moving-window, in the MOD13Q1 16-days dataset, that's 6 files. Then the loop calculates mean values on the array using ColMeans
, creates an empty raster, assigns the mean values to the new raster and saves it. Note that we recreated the circular
option of the movingFun
function. So, the 1st date's mean is done based on the last dates of that same year.
require(raster)
require(rgdal)
library(foreach)
library(doParallel)
rasterOptions(maxmemory = 3e10, chunksize = 2e10)
ip <- "directory-with-grids"
op <- "directory-where-resuls-are-being-saved"
years = c(2000:2017)
k <- 6 # moving window size
k2 <- floor((k-1)/2)
slot <- 0
# determine clusters
cl <- makeCluster(5, outfile = "") # make worker prints visible
registerDoParallel(cl)
foreach(j = 1:length(years), .packages=c("raster")) %dopar% {
ip1 = paste(ip, years[j],sep='/')
ndvi.files <- list.files(ip1, pattern = 'ndvi.*tif$',full.names = T)
nfiles <- length(ndvi.files)
for (n in (1-(k-1)):nfiles) {
i <- (n + k2 - 1) %% nfiles + 1
print(ndvi.files[i])
r <- raster(ndvi.files[i])
if (slot == 0) {
win <- matrix(data = NA, nrow = k, ncol = r@nrows * r@ncols)
}
slot <- slot %% k + 1
win[slot,] <- getValues(r)
if (n > 0) {
o <- raster(extent(c(xx,xx,xx ,xx))); res(o)=c(xx,xx) # your extent and resolution
crs(o) <-'+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'
o[] <- colMeans(win)
o[o<0] <- NA
# write out m as the nth raster
fname = paste(names(r),'smoothed',sep='_')
out.dir = file.path(op, paste(years[j], sep='/'))
dir.create(out.dir,showWarnings = FALSE)
out.path = file.path(out.dir, fname)
writeRaster(o, out.path, format="Geotiff", overwrite=T, datatype='INT2S')
}
}
}
stopCluster(cl)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.