I'd like to find the mean of vector of numbers that are within the bounds of two quantile cutoffs (offering a naive way to calculate the mean controlling for outliers).
Example : Three arguments, x
, the vector of numbers, lower
the lower-bound cutoff, and upper
, the upper bound cutoff.
meanSub <- function(x, lower = 0, upper = 1){ Cutoffs <- quantile(x, probs = c(lower,upper)) x <- subset(x, x >= Cutoffs[1] & x <= Cutoffs[2]) return(mean(x)) }
There are obviously numerous strait-forward ways to doing this. However, I am applying this function over many observations - I'm curious if you might offer tips for a function-design or pre-existing package that will do this very fast.
You can use the same method mean
uses for non-zero values of the trim
argument.
meanSub_g <- function(x, lower = 0, upper = 1){
Cutoffs <- quantile(x, probs = c(lower,upper))
return(mean(x[x >= Cutoffs[1] & x <= Cutoffs[2]]))
}
meanSub_j <- function(x, lower=0, upper=1){
if(isTRUE(all.equal(lower, 1-upper))) {
return(mean(x, trim=lower))
} else {
n <- length(x)
lo <- floor(n * lower) + 1
hi <- floor(n * upper)
y <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
return(mean(y))
}
}
require(microbenchmark)
set.seed(21)
x <- rnorm(1e6)
microbenchmark(meanSub_g(x), meanSub_j(x), times=10)
# Unit: milliseconds
# expr min lq median uq max neval
# meanSub_g(x) 233.037178 236.089867 244.807039 278.221064 312.243826 10
# meanSub_j(x) 3.966353 4.585641 4.734748 5.288245 6.071373 10
microbenchmark(meanSub_g(x, .1, .7), meanSub_j(x, .1, .7), times=10)
# Unit: milliseconds
# expr min lq median uq max neval
# meanSub_g(x, 0.1, 0.7) 233.54520 234.7938 241.6667 272.3872 277.6248 10
# meanSub_j(x, 0.1, 0.7) 94.73928 95.1042 126.7539 128.6937 130.8479 10
I wouldn't call subset
, it may be slow:
meanSub <- function(x, lower = 0, upper = 1){
Cutoffs <- quantile(x, probs = c(lower,upper))
return(mean(x[x >= Cutoffs[1] & x <= Cutoffs[2]]))
}
Otherwise, your code is OK and should be already very fast. Of course, as single-threaded computations on in-memory data are concerned.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.