[英]find points in vector where change is greater than threshold
我想在向量中找到位置,其中值與向量中較早的點相差超過某個閾值。 應該相對於矢量中的第一個值來測量第一個變化點。 應相對於先前的變化點測量后續變化點。
我可以使用for
循環來做到這一點,但我想知道是否有更慣用和更快的矢量化靈魂。
最小的例子:
set.seed(123)
x = cumsum(rnorm(500))
mindiff = 5.0
start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}
plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')
在Rcpp
實現相同的代碼可以提高速度。
library(Rcpp)
cppFunction(
"IntegerVector foo(NumericVector vect, double difference){
int start = 0;
IntegerVector changepoints;
for (int i = 0; i < vect.size(); i++){
if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
changepoints.push_back (i+1);
start = i;
}
}
return(changepoints);
}"
)
foo(vect = x, difference = mindiff)
# [1] 17 25 56 98 108 144 288 297 307 312 403 470 487
identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE
標桿
#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0
library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318 100 b
# Rcpp 7.907011 11.93539 14.47328 12.16848 12.38791 263.2796 100 a
這是一個使用baseR Reduce
的解決方案。 使用參數accumulate = TRUE
,reduce返回每次調用函數的結果。 在我們的情況下,將代表start
使用該解決方案的價值for
回路。 一旦你有了這個向量,我們只需要找到值改變的索引:
#Find the changepoints
r <- Reduce(function(a,e) {
if (abs(e - a) > mindiff)
e
else
a
}, x, accumulate =T)
# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1
編輯 :我已經使用@Eric Watt的評論更新了答案。
為了完整性,使用遞歸,我們可以得到僅使用R向量化函數的答案。 但是,這不適用於大型結果向量。 例如,在OP示例中,當length(x)== 1e5時,我們得到“嵌套太深的評估”錯誤
N = length(x)
f.recurs = function(x, mindiff, i=1) {
next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
else NULL
}
f.recurs(x, 5.0)
# [1] 17 25 56 98 108 144 288 297 307 312 403 470 487
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.