簡體   English   中英

如何擺脫R中一個時間序列中的多個離群值?

[英]How to get rid of multiple outliers in a timeseries in R?

我使用“離群值”包以刪除一些不希望的值。 但是似乎rm.outliers()函數不能同時替換所有異常值。 可能rm.outliers()無法遞歸執行調度。 然后,基本上,我必須多次調用此函數才能替換所有異常值。 這是我遇到的問題的可復制示例:

require(outliers)
   # creating a timeseries:
   set.seed(12345)
   y = rnorm(10000)
   # inserting some outliers:
   y[4000:4500] = -11
   y[4501:5000] = -10
   y[5001:5100] = -9
   y[5101:5200] = -8
   y[5201:5300] = -7
   y[5301:5400] = -6
   y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")

有誰知道如何改進上面的代碼,以便所有均值可以替換為平均值?

我能想到的最好的想法就是使用for循環,在找到異常值時對其進行跟蹤。

plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")

maxIter <- 100
outlierQ <- rep(F, length(y))

for (i in 1:maxIter) {
  bad <- outlier(y, logical = T)
  if (!any(bad)) break
  outlierQ[bad] <- T
  y[bad] <- mean(y[!bad])
}

y[outlierQ] <- mean(y[!outlierQ])

lines(y, col="blue")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM