简体   繁体   中英

How do I remove outlier rows

    ap_lo          cholesterol gluc      smoke     alco      active   
 Min.   :  -70.00   1:52385     1:59479   0:63831   0:66236   0:13739  
 1st Qu.:   80.00   2: 9549     2: 5190   1: 6169   1: 3764   1:56261  
 Median :   80.00   3: 8066     3: 5331                                
 Mean   :   96.63                                                      
 3rd Qu.:   90.00                                                      
 Max.   :11000.00

If you notice, ap_lo has outliers on both ends. They are data entry errors. ap_lo is Diastolic Blood Pressure. It shouldn't be either negative or that high.

I want to remove them (and possibly find more). How would I go about removing their index in R?

The following code is not the answer:

CV$ap_lo <- CV[-c(which.min(CV$ap_lo))]

One of many valid approaches is to use the non-parametric outliers returned by boxplot .

let v be your values: v <- c(-100, rnorm(50), 100) and stats be the statistics returned by boxplot ing (w/o actual need for plotting): stats <- boxplot(v, plot = FALSE) , then you can remove the outliers (according to boxplot standards) like: v_without_outliers <- setdiff(v, stats$out) , or short:

v_without_outliers <- 
    setdiff(v, (v |> boxplot(plot = FALSE))$out)

related:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM