R: pmax() function 忽略不适用？

Question

I built this custom "winsorize" function that does what it should, unless there are NA's in the data.我构建了这个自定义的“winsorize”function，它可以做它应该做的事情，除非数据中有 NA。

How it works:这个怎么运作：

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x <- pmax(x, sort(x)[numWin+1])
  x <- pmin(x, sort(x)[length(x)-numWin])

  return(x)
}

x <- 0:10

winsor1(x, probability=0.01)
[1] 1  1  2  3  4  5  6  7  8  9  9

So it replaces the top (and bottom) 1% of the data (rounded up to the next value, since there are only 11 values in the example).因此它替换了顶部（和底部）1% 的数据（四舍五入到下一个值，因为示例中只有 11 个值）。 If there are, eg, 250 values then the bottom 3 and top 3 values would be replaced by the bottom 4th and top 4th respectively.例如，如果有 250 个值，则底部 3 和顶部 3 值将分别替换为底部 4 和顶部 4。

The whole thing breaks down when there are NA's in the data, causing an error.当数据中有 NA 时，整个事情就会崩溃，从而导致错误。 However, if I set na.rm = TRUE in the pmax() and pmin() then the NA's themselves are replaced by the bottom value.但是，如果我在pmax()和pmin()中设置na.rm = TRUE ，则NA's本身将被底部值替换。

x[5] <- NA

winsor1(x, probability=0.01)
[1] 1  1  2  3  1  5  6  7  8  9  9

What can I do so that the NA's are preserved but do not cause an error?我该怎么做才能保留NA's但不会导致错误？ This is the output I want for the last line:这是我想要的最后一行的 output：

winsor1(x, probability=0.01)
[1] 1  1  2  3  NA  5  6  7  8  9  9

Answer 1

The issue is with sort as it removes the NA by default or else we have to specify na.last = TRUE which may also not be the case we need.问题在于sort ，因为它默认删除 NA ，否则我们必须指定na.last = TRUE ，这也可能不是我们需要的情况。 One option is order一种选择是order

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x1 <- x[order(x)]
  x <- pmax(x, x1[numWin+1])
  x1 <- x1[order(x1)]
  x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)

  return(x)
}

-testing -测试

x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9

x[5] <- NA 
winsor1(x, probability=0.01)
#[1]  1  1  2  3 NA  5  6  7  8  9 10

or with na.last in sort或na.last sort

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
  x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)

  return(x)
}

R: pmax() function 忽略不适用？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-10 19:08:37

R: pmax() function 忽略不适用？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-10 19:08:37

解决方案1
2 已采纳 2020-05-10 19:08:37