[英]Why does na.rm=TRUE not work for weighted SD in R?
I have a dataframe of 10 columns with house prices, that in some cases, includes NAs.我有一个包含房价的 10 列数据框,在某些情况下,包括 NA。 I want to create a new column of
weighted sd
but for the rows that have a few NAs, I get the following error:我想创建一个新的
weighted sd
列,但是对于具有几个 NA 的行,我收到以下错误:
Error in e2[[j]] : subscript out of bounds
What I use per row (and works for rows without NAs):我每行使用的内容(适用于没有 NA 的行):
weighted.sd(my.df[40,2:10], c(9,9,9,9,9,9,9,9,9), na.rm = TRUE)
Example例子
library(radiant.data)
data("mtcars")
mtcars[mtcars == 0] <- NA
weighted.sd(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
weighted.sd(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here
What is the problem here and how can I create a new column with the weighted SD per row?这里有什么问题,如何创建一个每行加权 SD 的新列?
The problem appears to be that weighted.sd()
will not operate as you are expecting across rows of a data frame.问题似乎是
weighted.sd()
不会像您期望的那样跨数据框的行运行。
Running weighted.sd
we can see the code:运行
weighted.sd
我们可以看到代码:
weighted.sd <- function (x, wt, na.rm = TRUE)
{
if (na.rm) {
x <- na.omit(x)
wt <- na.omit(wt)
}
wt <- wt/sum(wt)
wm <- weighted.mean(x, wt)
sqrt(sum(wt * (x - wm)^2))
}
In your example, you are not feeding in a vector for x
, but rather a single row of a data frame.在您的示例中,您没有输入
x
的向量,而是输入数据框的一行。 Function na.omit(x)
will remove that entire row, due to the NA
values - not elements of the vector.由于
NA
值 - 而不是向量的元素,函数na.omit(x)
将删除整行。
You can try to convert the row to a vector with as.numeric()
, but that will fail for this function as well due to how NA
is removed from wt
.您可以尝试使用
as.numeric()
将行转换为向量,但由于如何从wt
删除NA
,此函数也会失败。
It seems like something like this is probably what you want.看起来像这样的东西可能就是你想要的。 Of course, you have to be careful that you are feeding in valid columns for
x
.当然,您必须小心为
x
输入有效的列。
weighted.sd2 <- function (x, wt, na.rm = TRUE) {
x <- as.numeric(x)
if (na.rm) {
is_na <- is.na(x)
x <- x[!is_na]
wt <- wt[!is_na]
}
wt <- wt/sum(wt)
wm <- weighted.mean(x, wt)
sqrt(sum(wt * (x - wm)^2))
}
weighted.sd2(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
# [1] 26.76086
weighted.sd2(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here
# [1] 116.545
To apply this to all columns, you can use apply()
.要将其应用于所有列,您可以使用
apply()
。
mtcars$weighted.sd <- apply(mtcars[,1:11], 1, weighted.sd2, wt = rep(11, 11))
mpg cyl disp hp drat wt qsec vs am gear carb weighted.sd
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 NA 1 4 4 52.61200
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 NA 1 4 4 52.58011
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 37.06108
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 NA 3 1 78.36300
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 NA NA 3 2 116.54503
...
If you do a CTRL+click on weigted.sd
function you can see the source code:如果你按 CTRL+点击
weigted.sd
函数,你可以看到源代码:
function (x, wt, na.rm = TRUE)
{
if (na.rm) {
x <- na.omit(x)
wt <- na.omit(wt)
}
wt <- wt/sum(wt)
wm <- weighted.mean(x, wt)
sqrt(sum(wt * (x - wm)^2))
}
When you run it, value vector contain values without NA's and it is reduced.当您运行它时,值向量包含没有 NA 的值并且它会减少。 But the weigth vector has the same length as before, resulting in an error.
但是权重向量的长度与之前相同,导致错误。
A solution would be:一个解决方案是:
weighted.sd(mtcars[5,!is.na(mtcars[5,1:11])],
c(11,11,11,11,11,11,11,11,11,11,11)[!is.na(mtcars[5,1:11])], na.rm = TRUE)
It's not elegant... But it does the job!它并不优雅......但它可以完成工作!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.