簡體   English   中英

在向量中找到變化大於閾值的點

[英]find points in vector where change is greater than threshold

我想在向量中找到位置,其中值與向量中較早的點相差超過某個閾值。 應該相對於矢量中的第一個值來測量第一個變化點。 應相對於先前的變化點測量后續變化點。

我可以使用for循環來做到這一點,但我想知道是否有更慣用和更快的矢量化靈魂。

最小的例子:

set.seed(123)
x = cumsum(rnorm(500))

mindiff = 5.0
start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
  if (abs(x[i] - start) > mindiff) {
    changepoints = c(changepoints, i)
    start = x[i]
  }
}

plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')

在此輸入圖像描述

Rcpp實現相同的代碼可以提高速度。

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE

標桿

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a 

這是一個使用baseR Reduce的解決方案。 使用參數accumulate = TRUE ,reduce返回每次調用函數的結果。 在我們的情況下,將代表start使用該解決方案的價值for回路。 一旦你有了這個向量,我們只需要找到值改變的索引:

#Find the changepoints
r <- Reduce(function(a,e) {
  if (abs(e - a) > mindiff)
    e
  else 
    a
  }, x, accumulate =T)

# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1

編輯 :我已經使用@Eric Watt的評論更新了答案。

為了完整性,使用遞歸,我們可以得到僅使用R向量化函數的答案。 但是,這不適用於大型結果向量。 例如,在OP示例中,當length(x)== 1e5時,我們得到“嵌套太深的評估”錯誤

N = length(x)
f.recurs = function(x, mindiff, i=1) {
  next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
  if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
  else NULL
}

f.recurs(x, 5.0)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM