ap_lo cholesterol gluc smoke alco active
Min. : -70.00 1:52385 1:59479 0:63831 0:66236 0:13739
1st Qu.: 80.00 2: 9549 2: 5190 1: 6169 1: 3764 1:56261
Median : 80.00 3: 8066 3: 5331
Mean : 96.63
3rd Qu.: 90.00
Max. :11000.00
If you notice, ap_lo
has outliers on both ends. They are data entry errors. ap_lo is Diastolic Blood Pressure. It shouldn't be either negative or that high.
I want to remove them (and possibly find more). How would I go about removing their index in R?
The following code is not the answer:
CV$ap_lo <- CV[-c(which.min(CV$ap_lo))]
One of many valid approaches is to use the non-parametric outliers returned by boxplot
.
let v
be your values: v <- c(-100, rnorm(50), 100)
and stats
be the statistics returned by boxplot
ing (w/o actual need for plotting): stats <- boxplot(v, plot = FALSE)
, then you can remove the outliers (according to boxplot standards) like: v_without_outliers <- setdiff(v, stats$out)
, or short:
v_without_outliers <-
setdiff(v, (v |> boxplot(plot = FALSE))$out)
related:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.