简体   繁体   English

在大型栅格时间序列中使用movingFun的最有效方法是什么?

[英]What's the most efficient way to use movingFun in large rasters time series?

I have to smooth a large time series and I'm using the movingFun function from the 'raster' package. 我必须平滑一个较大的时间序列,并且正在使用“栅格”包中的movingFun函数。 I tested few options based on previous posts (see my options below). 我根据以前的帖子测试了一些选项(请参阅下面的选项)。 The first 2 work, but are very slow when using the real data (all MOD13Q1 time series for all of Australia). 前两个工作正常,但使用实际数据时速度非常慢(澳大利亚所有地区的所有MOD13Q1时间序列)。 So I attempted option 3 and failed. 所以我尝试了选项3,但失败了。 I'd appreciate if someone could help to point what's wrong in that function. 如果有人可以帮助指出该功能出了什么问题,我将不胜感激。 I have access to memory, I'm using an RStudio Server that has 700GB ram, still, I'm not sure what'd be the best approach to do this job. 我可以访问内存,但我正在使用具有700GB内存的RStudio服务器,但我不确定执行此工作的最佳方法是什么。 Thanks in advance. 提前致谢。

a) using movingFun and overlay a)使用movingFun和覆盖

library(raster)
r <- raster(ncol=10, nrow=10)
r[] <- runif(ncell(r))
s <- brick(r,r*r,r+2,r^5,r*3,r*5)
ptm <- proc.time()
v <- overlay(s, fun=function(x) movingFun(x, fun=mean, n=3, na.rm=TRUE, circular=TRUE)) #works
proc.time() - ptm

   user  system elapsed 
  0.140   0.016   0.982

b) creating a function and using clusterR. b)创建一个函数并使用clusterR。 I thought this would be faster than (a). 我认为这会比(a)快。

fun1 = function(x) {overlay(x, fun=function(x) movingFun(x, fun=mean, n=6, na.rm=TRUE, circular=TRUE))}

beginCluster(4)
ptm <- proc.time()
v = clusterR(s, fun1, progress = "text")
proc.time() - ptm
endCluster()
   user  system elapsed 
  0.124   0.012   4.069 

c) I found this document written by Robert J. Hijmans and I tried (and failed) to write a function as described in the vignettes. c)我发现该文档是由Robert J. Hijmans撰写的,我尝试(但失败了)编写一个小插曲中描述的函数。 I can't fully follow all the steps in that function, that's why is failing. 我无法完全执行该功能中的所有步骤,这就是失败的原因。

smooth.fun <- function(x, filename='', smooth_n ='',...) { #x could be a stack or list of rasters
  out <- brick(x)
  big <- ! canProcessInMemory(out, 3)
  filename <- trim(filename)
  if (big & filename == '') {
    filename <- rasterTmpFile()
  }
  if (filename != '') {
    out <- writeStart(out, filename, ...)
    todisk <- TRUE
  } else {
    vv <- matrix(ncol=nrow(out), nrow=ncol(out))
    todisk <- FALSE
  }

  bs <- blockSize(out)
  pb <- pbCreate(bs$n)

  if (todisk) {
    for (i in 1:bs$n) {
      v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
      v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
      out <- writeValues(out, v, bs$row[i])
      pbStep(pb, i)
    }
    out <- writeStop(out)
  } else {
    for (i in 1:bs$n) {
      v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
      v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
      cols <- bs$row[i]:(bs$row[i]+bs$nrows[i]-1)
      vv[,cols] <- matrix(v, nrow=out@ncols)
      pbStep(pb, i)
    }
    out <- setValues(out, as.vector(vv))
  }
  pbClose(pb)
  return(out)
}

s <- smooth.fun(s, filename='test.tif', smooth_n = 6, format='GTiff', overwrite=TRUE)

 Error in .local(.Object, ...) : 
  `/path-to-dir/test.tif' does not exist in the file system,
and is not recognised as a supported dataset name.

This is the solution I found, thanks to my colleague. 这是我找到的解决方案,这要归功于我的同事。 It computes each year (of 23 files) in 20 minutes. 它会在20分钟内计算每年(共23个文件)。 There may be things to improve, but at this stage, I'm happy I can do the job in only 20 min per year. 也许有些事情需要改进,但是在现阶段,我很高兴我每年仅能在20分钟内完成这项工作。

So here I run 5 years simultaneously using foreach package. 因此,在这里我使用foreach软件包同时运行5年。 Then the for loop creates an array with 6 files at the time; 然后for循环创建一个包含6个文件的数组。 remember that I needed a 3-months-moving-window, in the MOD13Q1 16-days dataset, that's 6 files. 请记住,在MOD13Q1 16天数据集中,我需要一个3个月的移动窗口,即6个文件。 Then the loop calculates mean values on the array using ColMeans , creates an empty raster, assigns the mean values to the new raster and saves it. 然后,循环使用ColMeans计算数组上的ColMeans ,创建一个空栅格,将平均值分配给新栅格并保存。 Note that we recreated the circular option of the movingFun function. 请注意,我们重新创建了movingFun函数的circular选项。 So, the 1st date's mean is done based on the last dates of that same year. 因此,第一个日期的均值是基于同一年的最后一个日期得出的。

require(raster)
require(rgdal)
library(foreach)
library(doParallel)

rasterOptions(maxmemory = 3e10, chunksize = 2e10)

ip <- "directory-with-grids"
op <- "directory-where-resuls-are-being-saved"

years = c(2000:2017)   

k <- 6    # moving window size
k2 <- floor((k-1)/2)
slot <- 0

# determine clusters
cl <- makeCluster(5, outfile = "") # make worker prints visible
registerDoParallel(cl)

foreach(j = 1:length(years), .packages=c("raster")) %dopar% {
  ip1 = paste(ip, years[j],sep='/')
  ndvi.files <- list.files(ip1, pattern = 'ndvi.*tif$',full.names = T) 
  nfiles <- length(ndvi.files)

  for (n in (1-(k-1)):nfiles) {
    i <- (n + k2 - 1) %% nfiles + 1
    print(ndvi.files[i])
    r <- raster(ndvi.files[i])
    if (slot == 0) {
      win <- matrix(data = NA, nrow = k, ncol = r@nrows * r@ncols)
    }
    slot <- slot %% k + 1
    win[slot,] <- getValues(r)
    if (n > 0) {
      o <- raster(extent(c(xx,xx,xx ,xx))); res(o)=c(xx,xx) # your extent and resolution
      crs(o) <-'+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'
      o[] <- colMeans(win)
      o[o<0] <- NA
      # write out m as the nth raster
      fname = paste(names(r),'smoothed',sep='_')
      out.dir =  file.path(op, paste(years[j], sep='/'))
      dir.create(out.dir,showWarnings = FALSE)
      out.path = file.path(out.dir, fname)
      writeRaster(o, out.path, format="Geotiff", overwrite=T,  datatype='INT2S')
    }
  }
}

stopCluster(cl)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 时间序列:为子集编写代码的最有效方法是什么? - Time series: What's the most efficient way to write code for subsets? 添加作为时间序列数据帧中重复数字的二进制指示符的列的最有效方法是什么? - What is the most efficient way to add a column that is a binary indicator of a recurring number in time series dataframe? 索引嵌套列表/环境的有效(或最有效)方法是什么? - What's an efficient (or the most efficient) way to index nested lists/environments? 在R中分区和访问数据帧行的最有效方法是什么? - What's the most efficient way to partition and access dataframe rows in R? 在数据表中获取不同行的最有效方法是什么? - What's the most efficient way to get distinct rows in a data table? 计算R中日期之间的最有效方法是什么? - What is the most efficient way to calculate time between dates in R? 在第一时间段内通过值标准化时间序列的最优雅方法是什么? - What is the most elegant way to standardize a time series by the value in the first period? 在数据框中移动列的最有效方法是什么 - what is the most efficient way to move a column in a dataframe 在多个数据帧上执行相同操作的最有效方法是什么? - what's the most efficient way to perform the same operation(s) on multiple data frames? 在 R 中粘贴字符串的最有效方法是什么? - What is the most efficient way to paste strings in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM