find points in vector where change is greater than threshold

Question

I want to find positions in a vector where the value differs by more than some threshold value from an earlier point in the vector. The first change-point should be measured relative to the first value in the vector. Subsequent change-points should be measured relative to the previous change-point.

I can do this using a for loop, but I wonder if there is a more idiomatic and faster vectorised soultion.

Minimal example:

set.seed(123)
x = cumsum(rnorm(500))

mindiff = 5.0
start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
  if (abs(x[i] - start) > mindiff) {
    changepoints = c(changepoints, i)
    start = x[i]
  }
}

plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')

Answer 1

Implementing the same code in Rcpp can help with speed.

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE

Benchmarking

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a

Answer 2

Here is a solution just using baseR Reduce . Using the argument accumulate = TRUE , reduce returns the result of every call to the function. In our case it will represent start value of the solution using the for loop. Once you have this vector, we only need to find the indexes where the value changes:

#Find the changepoints
r <- Reduce(function(a,e) {
  if (abs(e - a) > mindiff)
    e
  else 
    a
  }, x, accumulate =T)

# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1

EDIT : I have updated the answer using @Eric Watt's comment.

Answer 3

For completeness, using recursion, we can get an answer that uses only R vectorised functions. However, this will not work on large vectors of results. Eg In the OP example we get an "evaluation nested too deeply" error when length(x) == 1e5

N = length(x)
f.recurs = function(x, mindiff, i=1) {
  next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
  if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
  else NULL
}

f.recurs(x, 5.0)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

find points in vector where change is greater than threshold

Question

3 answers

solution1
3 ACCPTED 2017-08-24 18:21:30

solution2
3 2017-08-24 20:45:42

solution3
0 2017-08-26 18:20:49

find points in vector where change is greater than threshold

Question

3 answers

solution1 3 ACCPTED 2017-08-24 18:21:30

solution2 3 2017-08-24 20:45:42

solution3 0 2017-08-26 18:20:49

solution1
3 ACCPTED 2017-08-24 18:21:30

solution2
3 2017-08-24 20:45:42

solution3
0 2017-08-26 18:20:49