简体   繁体   English

R-从数据帧打印异常值

[英]R - print outlier from a datafram

I want to extract the outliers from my data frame. 我想从数据框中提取离群值。 Like 10 out of 1000 data points which are possible outliers or doesn't fall in 95% confidence interval. 就像1000个数据点中的10个一样,它们可能是异常值,或者不在95%置信区间内。 There are some ways to find the value with largest difference between it and sample mean. 有一些方法可以找到与样本均值之间差异最大的值。

> a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)
> require("outliers")
> outlier(a)
[1] 345

I don't want to remove the outliers from my dataframe or from my boxplot. 我不想从数据框或箱线图中删除异常值。 I want to print or subset them. 我想打印或子集化它们。

Any ideas? 有任何想法吗?

Given the data: 给定数据:

a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)

If you want to get values that are not within 95% confidence. 如果要获得不在95%置信度内的值。 You do have to keep in mind that confidence is concept of probability of "true mean". 您必须记住,信心是“真实均值”概率的概念。

In this case: 在这种情况下:

> mean(a)
[1] 53.07692

First question to answer: is 53 is the "normal" value you would most likely expect? 第一个要回答的问题:53是您最可能期望的“正常”值吗? Why do I ask it? 我为什么要问它? Because if you want to print values that are not within 95%: 因为如果要打印不在95%之内的值,请执行以下操作:

a[a > mean(a) + qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a)) |
    a < mean(a) - qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a))]

[1]   1   3   2   4   5   2   3  90 345

You might get much more than you expect, in your case. 在您的情况下,您可能会得到比预期更多的收益。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM