简体   繁体   中英

getting error in [R] - missing value where TRUE/FALSE needed

I am trying to step through a vector to find the outliers using IQR to calculate a range. When I run this script looking for values to the right of the IQR I get results and when I run to the left I get the error: missing value where TRUE/FALSE needed. How can I scrub out the true and false in my dataset? here is my script:

data = c(100, 120, 121, 123, 125, 124, 123, 123, 123, 124, 125, 167, 180, 123, 156)
Q3 <- quantile(data, 0.75) ##gets the third quantile from the list of vectors
Q1 <- quantile(data, 0.25) ## gets the first quantile from the list of vectors
outliers_left <-(Q1-1.5*IQR(data)) 
outliers_right <-(Q3+1.5*IQR(data))
IQR <- IQR(data)
paste("the innner quantile range is", IQR)
Q1 # quantil at 0.25
Q3 # quantile at 0.75
# show the range of numbers we have
paste("your range is", outliers_left, "through", outliers_right, "to determine outliers")
# count ho many vectors there are and then we will pass this value into a loop to look for 
# anything above and below the Q1-Q3 values
vectorCount <- sum(!is.na(data))
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}

and the error I get is

[1] 167
[1] 180
[1] 156
Error in if (x > outliers_right) { : 
missing value where TRUE/FALSE needed

as you can see if you run this script, it is finding my 3 outliers on the right and also throws the error, but when I run this again on the left of my IQR, and I do have an outlier of 100 in the vector, I just get the error without other results being displayed. How can I fix this script? any help greatly appreciated. I've been scouring the web and my books for days on how to fix this.

As noted in the comments, the error is due to the way you've constructed your while loop. At the last iteration, i == 16 though there are only 15 elements to process. Changing from i <= vectorCount to i < vectorCount fixes the problem:

i <- 1
while( i < vectorCount ){
  i <- i + 1
  x <- data[i]
  # if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
  if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156

However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data at once. To print your outliers, I'd do this:

data[data > outliers_right]
#-----
[1] 167 180 156

Or to get all of them at once using the OR operator:

data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156

For a little context, The above logical comparisons create a boolean value for each element of data and R only returns those that are TRUE. You can check this for yourself by typing:

data > outliers_right
#----
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE

The [ bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"[" .

The error message arises because you you let i <= vectorCount so i can equal vectorCount , and thus indexing i = i+1 from data will give NA , and the if statement will fail.

If you want to find the outliers based on the IQR, you can use findInterval

outliers <- data[findInterval(data, c(Q1,Q3)) != 1]

I would also stop using paste to create character messages to be printed , use message instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM