简体   繁体   English

空间滚动函数(最小值、最大值、平均值)

[英]Spatial rolling functions (min, max, mean)

I'm currently working on a project where I need to calculate the rolling minimum over a spatial window of 30 meters (it's a square around the central point).我目前正在做一个项目,我需要在 30 米的空间窗口上计算滚动最小值(它是围绕中心点的正方形)。 On my data frame for each point I have the X and Y coordinates and the variable Z for which I'm trying to get the rolling minimum.在每个点的数据框中,我都有 X 和 Y 坐标以及我试图获得滚动最小值的变量 Z。

So far I have accomplished it using for loops with conditionals and data table filtering.到目前为止,我已经使用带有条件和数据表过滤的 for 循环来完成它。 This takes some time, specially when the data bases have over a million points.这需要一些时间,特别是当数据库有超过一百万个点时。 I would really appreciate if you could help me with some tips of how to improve the performance of this code.如果您能帮助我提供一些有关如何提高此代码性能的提示,我将不胜感激。

d = 1
attach(data)
#### OPTION 1 - CONDITIONAL ####
op1 = NULL
for (i in 1:nrow(data)) {
  op1[i]<-
    min(
      ifelse(POINT_X>=POINT_X[i]-d,
           ifelse(POINT_X<=POINT_X[i]+d,
                  ifelse(POINT_Y>=POINT_Y[i]-d,
                         ifelse(POINT_Y<=POINT_Y[i]+d, Z, Z[i]),Z[i]),Z[i]),Z[i]), na.rm = T)} 

#### OPTION 2 - SUBSET ####
setDT(data)
local_min = function(i){
  x = POINT_X[i]
  y = POINT_Y[i]
  base = data[POINT_X %inrange% c(x-d,x+d)&
                POINT_Y %inrange% c(y-d,y+d)]
  local_min = min(base$Z, na.rm=T)
  return(local_min)}
op2 = NULL
for (i in 1:nrow(data)) {
    op2[i]<- local_min(i)}

I've tried other alternatives but the most common type of rolling statistic functions on R are based on index windows rather than values of other variables.我尝试了其他替代方法,但 R 上最常见的滚动统计函数类型是基于索引窗口而不是其他变量的值。 Here's some data for you to try the the code above with d=1 .这里有一些数据供您使用d=1尝试上面的代码。 I would be really grateful if you could help me improve this process.如果您能帮助我改进这个过程,我将不胜感激。

data = data.frame(POINT_X=rep(1:5, each =5),
                  POINT_Y=rep(1:5,5),
                  Z=1:25)

The desired output should look like this:所需的输出应如下所示:

> op1
 [1]  1  1  2  3  4  1  1  2  3  4  6  6  7  8  9 11 11 12 13 14 16 16 17 18 19

I think it's important to note that currently the option 1 is faster than the option 2. Thanks in advance for your attention.我认为需要注意的是,目前选项 1 比选项 2 更快。提前感谢您的关注。 :) :)

You could use a non-equi join :您可以使用非 equi join :

d = 1

data[,`:=`(xmin = POINT_X-d,
           xmax = POINT_X+d,
           ymin = POINT_Y-d,
           ymax = POINT_Y+d)]

data[data,on=.(POINT_X >= xmin,
               POINT_X <= xmax,
               POINT_Y >= ymin,
               POINT_Y <= ymax)][
     ,.(rollmin=min(Z)),by=.(POINT_X,POINT_Y)][
     ,rollmin]

#[1]  1  1  2  3  4  1  1  2  3  4  6  6  7  8  9 11 11 12 13 14 16 16 17 18 19

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM