[英]How to get rid of multiple outliers in a timeseries in R?
我使用“離群值”包以刪除一些不希望的值。 但是似乎rm.outliers()函數不能同時替換所有異常值。 可能rm.outliers()無法遞歸執行調度。 然后,基本上,我必須多次調用此函數才能替換所有異常值。 這是我遇到的問題的可復制示例:
require(outliers)
# creating a timeseries:
set.seed(12345)
y = rnorm(10000)
# inserting some outliers:
y[4000:4500] = -11
y[4501:5000] = -10
y[5001:5100] = -9
y[5101:5200] = -8
y[5201:5300] = -7
y[5301:5400] = -6
y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")
有誰知道如何改進上面的代碼,以便所有均值可以替換為平均值?
我能想到的最好的想法就是使用for
循環,在找到異常值時對其進行跟蹤。
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
maxIter <- 100
outlierQ <- rep(F, length(y))
for (i in 1:maxIter) {
bad <- outlier(y, logical = T)
if (!any(bad)) break
outlierQ[bad] <- T
y[bad] <- mean(y[!bad])
}
y[outlierQ] <- mean(y[!outlierQ])
lines(y, col="blue")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.