[英]R: pmax() function to ignore NA's?
I built this custom "winsorize" function that does what it should, unless there are NA's in the data.我构建了这个自定义的“winsorize”function,它可以做它应该做的事情,除非数据中有 NA。
How it works:这个怎么运作:
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x)[numWin+1])
x <- pmin(x, sort(x)[length(x)-numWin])
return(x)
}
x <- 0:10
winsor1(x, probability=0.01)
[1] 1 1 2 3 4 5 6 7 8 9 9
So it replaces the top (and bottom) 1% of the data (rounded up to the next value, since there are only 11 values in the example).因此它替换了顶部(和底部)1% 的数据(四舍五入到下一个值,因为示例中只有 11 个值)。 If there are, eg, 250 values then the bottom 3 and top 3 values would be replaced by the bottom 4th and top 4th respectively.
例如,如果有 250 个值,则底部 3 和顶部 3 值将分别替换为底部 4 和顶部 4。
The whole thing breaks down when there are NA's in the data, causing an error.当数据中有 NA 时,整个事情就会崩溃,从而导致错误。 However, if I set
na.rm = TRUE
in the pmax()
and pmin()
then the NA's
themselves are replaced by the bottom value.但是,如果我在
pmax()
和pmin()
中设置na.rm = TRUE
,则NA's
本身将被底部值替换。
x[5] <- NA
winsor1(x, probability=0.01)
[1] 1 1 2 3 1 5 6 7 8 9 9
What can I do so that the NA's
are preserved but do not cause an error?我该怎么做才能保留
NA's
但不会导致错误? This is the output I want for the last line:这是我想要的最后一行的 output:
winsor1(x, probability=0.01)
[1] 1 1 2 3 NA 5 6 7 8 9 9
The issue is with sort
as it removes the NA by default or else we have to specify na.last = TRUE
which may also not be the case we need.问题在于
sort
,因为它默认删除 NA ,否则我们必须指定na.last = TRUE
,这也可能不是我们需要的情况。 One option is order
一种选择是
order
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x1 <- x[order(x)]
x <- pmax(x, x1[numWin+1])
x1 <- x1[order(x1)]
x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)
return(x)
}
-testing -测试
x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9
x[5] <- NA
winsor1(x, probability=0.01)
#[1] 1 1 2 3 NA 5 6 7 8 9 10
or with na.last
in sort
或
na.last
sort
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)
return(x)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.