简体   繁体   English

在向量中找到变化大于阈值的点

[英]find points in vector where change is greater than threshold

I want to find positions in a vector where the value differs by more than some threshold value from an earlier point in the vector. 我想在向量中找到位置,其中值与向量中较早的点相差超过某个阈值。 The first change-point should be measured relative to the first value in the vector. 应该相对于矢量中的第一个值来测量第一个变化点。 Subsequent change-points should be measured relative to the previous change-point. 应相对于先前的变化点测量后续变化点。

I can do this using a for loop, but I wonder if there is a more idiomatic and faster vectorised soultion. 我可以使用for循环来做到这一点,但我想知道是否有更惯用和更快的矢量化灵魂。

Minimal example: 最小的例子:

set.seed(123)
x = cumsum(rnorm(500))

mindiff = 5.0
start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
  if (abs(x[i] - start) > mindiff) {
    changepoints = c(changepoints, i)
    start = x[i]
  }
}

plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')

在此输入图像描述

Implementing the same code in Rcpp can help with speed. Rcpp实现相同的代码可以提高速度。

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE

Benchmarking 标杆

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a 

Here is a solution just using baseR Reduce . 这是一个使用baseR Reduce的解决方案。 Using the argument accumulate = TRUE , reduce returns the result of every call to the function. 使用参数accumulate = TRUE ,reduce返回每次调用函数的结果。 In our case it will represent start value of the solution using the for loop. 在我们的情况下,将代表start使用该解决方案的价值for回路。 Once you have this vector, we only need to find the indexes where the value changes: 一旦你有了这个向量,我们只需要找到值改变的索引:

#Find the changepoints
r <- Reduce(function(a,e) {
  if (abs(e - a) > mindiff)
    e
  else 
    a
  }, x, accumulate =T)

# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1

EDIT : I have updated the answer using @Eric Watt's comment. 编辑 :我已经使用@Eric Watt的评论更新了答案。

For completeness, using recursion, we can get an answer that uses only R vectorised functions. 为了完整性,使用递归,我们可以得到仅使用R向量化函数的答案。 However, this will not work on large vectors of results. 但是,这不适用于大型结果向量。 Eg In the OP example we get an "evaluation nested too deeply" error when length(x) == 1e5 例如,在OP示例中,当length(x)== 1e5时,我们得到“嵌套太深的评估”错误

N = length(x)
f.recurs = function(x, mindiff, i=1) {
  next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
  if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
  else NULL
}

f.recurs(x, 5.0)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM