在大型栅格时间序列中使用movingFun的最有效方法是什么？

Question

I have to smooth a large time series and I'm using the movingFun function from the 'raster' package. 我必须平滑一个较大的时间序列，并且正在使用“栅格”包中的movingFun函数。 I tested few options based on previous posts (see my options below). 我根据以前的帖子测试了一些选项（请参阅下面的选项）。 The first 2 work, but are very slow when using the real data (all MOD13Q1 time series for all of Australia). 前两个工作正常，但使用实际数据时速度非常慢（澳大利亚所有地区的所有MOD13Q1时间序列）。 So I attempted option 3 and failed. 所以我尝试了选项3，但失败了。 I'd appreciate if someone could help to point what's wrong in that function. 如果有人可以帮助指出该功能出了什么问题，我将不胜感激。 I have access to memory, I'm using an RStudio Server that has 700GB ram, still, I'm not sure what'd be the best approach to do this job. 我可以访问内存，但我正在使用具有700GB内存的RStudio服务器，但我不确定执行此工作的最佳方法是什么。 Thanks in advance. 提前致谢。

a) using movingFun and overlay a）使用movingFun和覆盖

library(raster)
r <- raster(ncol=10, nrow=10)
r[] <- runif(ncell(r))
s <- brick(r,r*r,r+2,r^5,r*3,r*5)
ptm <- proc.time()
v <- overlay(s, fun=function(x) movingFun(x, fun=mean, n=3, na.rm=TRUE, circular=TRUE)) #works
proc.time() - ptm

   user  system elapsed 
  0.140   0.016   0.982

b) creating a function and using clusterR. b）创建一个函数并使用clusterR。 I thought this would be faster than (a). 我认为这会比（a）快。

fun1 = function(x) {overlay(x, fun=function(x) movingFun(x, fun=mean, n=6, na.rm=TRUE, circular=TRUE))}

beginCluster(4)
ptm <- proc.time()
v = clusterR(s, fun1, progress = "text")
proc.time() - ptm
endCluster()
   user  system elapsed 
  0.124   0.012   4.069

c) I found this document written by Robert J. Hijmans and I tried (and failed) to write a function as described in the vignettes. c）我发现该文档是由Robert J. Hijmans撰写的，我尝试（但失败了）编写一个小插曲中描述的函数。 I can't fully follow all the steps in that function, that's why is failing. 我无法完全执行该功能中的所有步骤，这就是失败的原因。

smooth.fun <- function(x, filename='', smooth_n ='',...) { #x could be a stack or list of rasters
  out <- brick(x)
  big <- ! canProcessInMemory(out, 3)
  filename <- trim(filename)
  if (big & filename == '') {
    filename <- rasterTmpFile()
  }
  if (filename != '') {
    out <- writeStart(out, filename, ...)
    todisk <- TRUE
  } else {
    vv <- matrix(ncol=nrow(out), nrow=ncol(out))
    todisk <- FALSE
  }

  bs <- blockSize(out)
  pb <- pbCreate(bs$n)

  if (todisk) {
    for (i in 1:bs$n) {
      v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
      v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
      out <- writeValues(out, v, bs$row[i])
      pbStep(pb, i)
    }
    out <- writeStop(out)
  } else {
    for (i in 1:bs$n) {
      v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
      v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
      cols <- bs$row[i]:(bs$row[i]+bs$nrows[i]-1)
      vv[,cols] <- matrix(v, nrow=out@ncols)
      pbStep(pb, i)
    }
    out <- setValues(out, as.vector(vv))
  }
  pbClose(pb)
  return(out)
}

s <- smooth.fun(s, filename='test.tif', smooth_n = 6, format='GTiff', overwrite=TRUE)

 Error in .local(.Object, ...) : 
  `/path-to-dir/test.tif' does not exist in the file system,
and is not recognised as a supported dataset name.

Answer 1

This is the solution I found, thanks to my colleague. 这是我找到的解决方案，这要归功于我的同事。 It computes each year (of 23 files) in 20 minutes. 它会在20分钟内计算每年（共23个文件）。 There may be things to improve, but at this stage, I'm happy I can do the job in only 20 min per year. 也许有些事情需要改进，但是在现阶段，我很高兴我每年仅能在20分钟内完成这项工作。

So here I run 5 years simultaneously using foreach package. 因此，在这里我使用foreach软件包同时运行5年。 Then the for loop creates an array with 6 files at the time; 然后for循环创建一个包含6个文件的数组。 remember that I needed a 3-months-moving-window, in the MOD13Q1 16-days dataset, that's 6 files. 请记住，在MOD13Q1 16天数据集中，我需要一个3个月的移动窗口，即6个文件。 Then the loop calculates mean values on the array using ColMeans , creates an empty raster, assigns the mean values to the new raster and saves it. 然后，循环使用ColMeans计算数组上的ColMeans ，创建一个空栅格，将平均值分配给新栅格并保存。 Note that we recreated the circular option of the movingFun function. 请注意，我们重新创建了movingFun函数的circular选项。 So, the 1st date's mean is done based on the last dates of that same year. 因此，第一个日期的均值是基于同一年的最后一个日期得出的。

require(raster)
require(rgdal)
library(foreach)
library(doParallel)

rasterOptions(maxmemory = 3e10, chunksize = 2e10)

ip <- "directory-with-grids"
op <- "directory-where-resuls-are-being-saved"

years = c(2000:2017)   

k <- 6    # moving window size
k2 <- floor((k-1)/2)
slot <- 0

# determine clusters
cl <- makeCluster(5, outfile = "") # make worker prints visible
registerDoParallel(cl)

foreach(j = 1:length(years), .packages=c("raster")) %dopar% {
  ip1 = paste(ip, years[j],sep='/')
  ndvi.files <- list.files(ip1, pattern = 'ndvi.*tif$',full.names = T) 
  nfiles <- length(ndvi.files)

  for (n in (1-(k-1)):nfiles) {
    i <- (n + k2 - 1) %% nfiles + 1
    print(ndvi.files[i])
    r <- raster(ndvi.files[i])
    if (slot == 0) {
      win <- matrix(data = NA, nrow = k, ncol = r@nrows * r@ncols)
    }
    slot <- slot %% k + 1
    win[slot,] <- getValues(r)
    if (n > 0) {
      o <- raster(extent(c(xx,xx,xx ,xx))); res(o)=c(xx,xx) # your extent and resolution
      crs(o) <-'+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'
      o[] <- colMeans(win)
      o[o<0] <- NA
      # write out m as the nth raster
      fname = paste(names(r),'smoothed',sep='_')
      out.dir =  file.path(op, paste(years[j], sep='/'))
      dir.create(out.dir,showWarnings = FALSE)
      out.path = file.path(out.dir, fname)
      writeRaster(o, out.path, format="Geotiff", overwrite=T,  datatype='INT2S')
    }
  }
}

stopCluster(cl)

在大型栅格时间序列中使用movingFun的最有效方法是什么？

问题描述

1 个解决方案

解决方案1
1 2018-06-08 04:06:17

在大型栅格时间序列中使用movingFun的最有效方法是什么？

问题描述

1 个解决方案

解决方案1 1 2018-06-08 04:06:17

解决方案1
1 2018-06-08 04:06:17