简体   繁体   中英

How do I create a new column in R that is 1 if a certain value in another column is an outlier?

I want to create a new column that is 1 if the value of a particular column is an outlier. Otherwise, the value should be 0.

An example would be the following:

outlier <- c(rnorm(10,0,5),40,-60,rnorm(10,0,5))

        V1
1   -6.273411
2   -6.576979
3   9.256693
4   -2.448468
5   -7.386433
6   -8.922403
7   -1.339524
8   -2.136594
9   -2.271990
10  -6.066499
11  40.000000
12  -60.000000
13  6.697281
14  -3.212984
15  6.950176
16  -7.054237
17  11.820208
18  -1.836457
19  -1.341675
20  -3.271044
21  -10.260103
22  8.239565

So, observation 11 and 12 should be clearly outliers:

boxplot.stats(outlier)$out

[1]  40 -60

What I want to archive is the following:

        V1      V2
1   -6.273411   0
2   -6.576979   0
3   9.256693    0
4   -2.448468   0
5   -7.386433   0
6   -8.922403   0
7   -1.339524   0
8   -2.136594   0
9   -2.271990   0
10  -6.066499   0
11  40.000000   1
12  -60.000000  1
13  6.697281    0
14  -3.212984   0
15  6.950176    0
16  -7.054237   0
17  11.820208   0
18  -1.836457   0
19  -1.341675   0
20  -3.271044   0
21  -10.260103  0
22  8.239565    0

Is there any elegant way to do this?

Thanks!

We can use %in% to convert to logical and coerce it back to binary with as.integer or +

+(outlier %in% boxplot.stats(outlier)$out)
#[1] 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0

Keep in mind there is no universal, agreed definition for what is an "outlier" in all cases. By default, boxplot assumes the value is no more than 1.5 times the inter-quartile range away from the .25 and .75 quartiles. You can write your own function which gives you complete control over the definition. For example

is_outlier <- function(x) {
  iqr <- IQR(x)
  q <- quantile(x, c(.25, .75))
  x < q[1]-1.5*iqr | x > q[2]+1.5*iqr
}

you can use it with your data like

is_outlier(outlier)

which returns TRUE/FALSE. Which you can convert to 1/0 with as.numeric(is_outlier(outlier)) or is_outlier(outlier)+0 if that's really needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM