寻找向量的平均值，不包括R中的分位数截止

Question

I'd like to find the mean of vector of numbers that are within the bounds of two quantile cutoffs (offering a naive way to calculate the mean controlling for outliers). 我想找到两个分位数临界值范围内的数字向量的均值（提供一种简单的方法来计算控制异常值的均值）。

Example : Three arguments, x , the vector of numbers, lower the lower-bound cutoff, and upper , the upper bound cutoff. 示例：三个参数x （数字的向量） lower下限边界，而upper则上限。

 meanSub <- function(x, lower = 0, upper = 1){ Cutoffs <- quantile(x, probs = c(lower,upper)) x <- subset(x, x >= Cutoffs[1] & x <= Cutoffs[2]) return(mean(x)) }

There are obviously numerous strait-forward ways to doing this. 显然，有很多海峡前进的方式来做到这一点。 However, I am applying this function over many observations - I'm curious if you might offer tips for a function-design or pre-existing package that will do this very fast. 但是，我将此功能应用于许多观察结果-我很好奇您是否可以提供功能设计或预先存在的程序包的提示，这些提示可以非常快地完成此操作。

Answer 1

You can use the same method mean uses for non-zero values of the trim argument. 您可以使用同样的方法mean对于非零值用途trim的说法。

meanSub_g <- function(x, lower = 0, upper = 1){
  Cutoffs <- quantile(x, probs = c(lower,upper))
  return(mean(x[x >= Cutoffs[1] & x <= Cutoffs[2]]))
}

meanSub_j <- function(x, lower=0, upper=1){
  if(isTRUE(all.equal(lower, 1-upper))) {
    return(mean(x, trim=lower))
  } else {
    n <- length(x)
    lo <- floor(n * lower) + 1
    hi <- floor(n * upper)
    y <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
    return(mean(y))
  }
}

require(microbenchmark)
set.seed(21)
x <- rnorm(1e6)
microbenchmark(meanSub_g(x), meanSub_j(x), times=10)
# Unit: milliseconds
#          expr        min         lq     median         uq        max neval
#  meanSub_g(x) 233.037178 236.089867 244.807039 278.221064 312.243826    10
#  meanSub_j(x)   3.966353   4.585641   4.734748   5.288245   6.071373    10
microbenchmark(meanSub_g(x, .1, .7), meanSub_j(x, .1, .7), times=10)
# Unit: milliseconds
#                    expr       min       lq   median       uq      max neval
#  meanSub_g(x, 0.1, 0.7) 233.54520 234.7938 241.6667 272.3872 277.6248    10
#  meanSub_j(x, 0.1, 0.7)  94.73928  95.1042 126.7539 128.6937 130.8479    10

Answer 2

I wouldn't call subset , it may be slow: 我不会叫subset ，它可能很慢：

meanSub <- function(x, lower = 0, upper = 1){
  Cutoffs <- quantile(x, probs = c(lower,upper))
  return(mean(x[x >= Cutoffs[1] & x <= Cutoffs[2]]))
}

Otherwise, your code is OK and should be already very fast. 否则，您的代码就可以了，应该已经非常快了。 Of course, as single-threaded computations on in-memory data are concerned. 当然，要关注内存数据的单线程计算。

寻找向量的平均值，不包括R中的分位数截止

问题描述

2 个解决方案

解决方案1
4 已采纳 2014-06-15 17:22:00

解决方案2
3 2014-06-15 16:55:49

寻找向量的平均值，不包括R中的分位数截止

问题描述

2 个解决方案

解决方案1 4 已采纳 2014-06-15 17:22:00

解决方案2 3 2014-06-15 16:55:49

解决方案1
4 已采纳 2014-06-15 17:22:00

解决方案2
3 2014-06-15 16:55:49