I want to extract the outliers from my data frame. Like 10 out of 1000 data points which are possible outliers or doesn't fall in 95% confidence interval. There are some ways to find the value with largest difference between it and sample mean.
> a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)
> require("outliers")
> outlier(a)
[1] 345
I don't want to remove the outliers from my dataframe or from my boxplot. I want to print or subset them.
Any ideas?
Given the data:
a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)
If you want to get values that are not within 95% confidence. You do have to keep in mind that confidence is concept of probability of "true mean".
In this case:
> mean(a)
[1] 53.07692
First question to answer: is 53 is the "normal" value you would most likely expect? Why do I ask it? Because if you want to print values that are not within 95%:
a[a > mean(a) + qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a)) |
a < mean(a) - qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a))]
[1] 1 3 2 4 5 2 3 90 345
You might get much more than you expect, in your case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.