I have a time series containing NAs and some sudden jumps like this:
input=c(1:5, NA, 6:7,0,9:12)
In which 7,0,9 would be considered a jump for which 0 should be replaced by NA.
I would like to remove the very first value in which a sudden jump (with set value of what qualifies as a jump, in the example a change > 1) occurs and set it NA
the output for the example should look like this:
output=c(1:5,NA,6:7,NA,9:12)
I only want to set outliers NA, I do not want to overwrite the remaining values. The jump can be both negative and positive.
Problems I encountered:
both of which resulted in more than necessary NAs, I try to keep as much original data as possible.
Any ideas? I have been stuck for a while. Thanks in advance!
There are three situations that are very similar but require different degrees of difficulties in terms of exceptions:
If the pattern always jumps back to 1-increase with a couple of interruptions, I would create vector_check
which resembles the perfect vector. Everything in input
that deviates from this should be set to NA
:
vector_check <- min(input):max(input)
inds <- vector_check != input
input[inds] <- NA
If the pattern is less predictable and you basically wish to look for 'irregular' pattern, you'll get a more complicated situation. A possible solution would be to create a while
-loop that checks which increments are larger than 2 (or whatever value seems sensible) and then replaces the problematic location bump_inds
with an NA
. Here I assume that an outlier creates two large increments: one because the value suddenly drops (increases) and one because it rises back up (drops back down) to its old value. This process proceeds until no problematic locations remain:
bump_ind <- rep(0, 3)
while(length(bump_ind) > 1){
bump_ind <- which( abs(diff(input)) > 2 )
input[bump_ind[2]] <- NA
}
input
# [1] 1 2 3 4 5 NA 6 7 NA 9 10 11 12
A third option, based on your real data sensor
shows that the data does not have to jump back to a the previous level:
input <- c(20.2,20.2,20.2,20.2,20.1,20.2,20.2,20.1,20.2, 20.2,20.2,20.2,17.7,
18.9,19.3,19.4,19.4,19.4,19.5,19.5,19.5)
bump_ind <- rep(0, 3)
while(length(bump_ind) > 1){
bump_ind <- which( abs(diff(input)) > 2 )
if(length(bump_ind) > 2){
bump_ind <- bump_ind[1:2]
}
if( length(bump_ind) == 1 ){
input[bump_ind[1] + 1] <- NA
} else if( diff(bump_ind > 1) ){
input[bump_ind[1] + 1] <- NA
} else{
input[bump_ind[2]] <- NA
}
}
input
# [1] 20.2 20.2 20.2 20.2 20.1 20.2 20.2 20.1 20.2 20.2 20.2 20.2 NA 18.9 19.3
# [16] 19.4 19.4 19.4 19.5 19.5 19.5
This may be a more robust solution since you could modify the linear model of your data below if necessary:
Your data:
input <- c(1:5, NA, 6:7,0,9:12)
A sequence of numbers:
x <- seq_len(length(input))
Select some threshold value for the residual of a linear model:
threshhold = 2
Calculate the linear model of your data and the residuals and select the outliers:
select <- abs((predict(lm(input ~ x), newdata = data.frame(x = x)) -input)) >= threshhold
Replace the outliers with 'NA'
input[select] <- NA
input
[1] 1 2 3 4 5 NA 6 7 NA 9 10 11 12
EDIT: With your data:
input=c(20.2, 20.2, 20.2, 20.2,
20.1, 20.2, 20.2, 20.1,
20.2, 20.2, 20.2, 20.2,
17.7, 18.9, 19.3, 19.4,
19.4, 19.4, 19.5, 19.5,
19.5)
x <- seq_len(length(input))
threshhold = 0.7
select <- abs((predict(lm(input ~ x), newdata = data.frame(x = x)) - input)) >= threshhold
inputnew <- input
inputnew[select] <- NA
input
[1] 20.2 20.2 20.2 20.2 20.1 20.2 20.2 20.1 20.2 20.2 20.2 20.2 17.7 18.9 19.3
[16] 19.4 19.4 19.4 19.5 19.5 19.5
inputnew
[1] 20.2 20.2 20.2 20.2 20.1 20.2 20.2 20.1 20.2 20.2 20.2 20.2 NA 18.9 19.3
[16] 19.4 19.4 19.4 19.5 19.5 19.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.