简体   繁体   English

为什么 na.rm=TRUE 不适用于 R 中的加权 SD?

[英]Why does na.rm=TRUE not work for weighted SD in R?

I have a dataframe of 10 columns with house prices, that in some cases, includes NAs.我有一个包含房价的 10 列数据框,在某些情况下,包括 NA。 I want to create a new column of weighted sd but for the rows that have a few NAs, I get the following error:我想创建一个新的weighted sd列,但是对于具有几个 NA 的行,我收到以下错误:

Error in e2[[j]] : subscript out of bounds

What I use per row (and works for rows without NAs):我每行使用的内容(适用于没有 NA 的行):

weighted.sd(my.df[40,2:10], c(9,9,9,9,9,9,9,9,9), na.rm = TRUE)

Example例子

library(radiant.data)
data("mtcars")
mtcars[mtcars == 0] <- NA
weighted.sd(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
weighted.sd(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here

What is the problem here and how can I create a new column with the weighted SD per row?这里有什么问题,如何创建一个每行加权 SD 的新列?

The problem appears to be that weighted.sd() will not operate as you are expecting across rows of a data frame.问题似乎是weighted.sd()不会像您期望的那样跨数据框的行运行。

Running weighted.sd we can see the code:运行weighted.sd我们可以看到代码:

weighted.sd <- function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

In your example, you are not feeding in a vector for x , but rather a single row of a data frame.在您的示例中,您没有输入x的向量,而是输入数据框的一行。 Function na.omit(x) will remove that entire row, due to the NA values - not elements of the vector.由于NA值 - 而不是向量的元素,函数na.omit(x)将删除整行。

You can try to convert the row to a vector with as.numeric() , but that will fail for this function as well due to how NA is removed from wt .您可以尝试使用as.numeric()将行转换为向量,但由于如何从wt删除NA ,此函数也会失败。

It seems like something like this is probably what you want.看起来像这样的东西可能就是你想要的。 Of course, you have to be careful that you are feeding in valid columns for x .当然,您必须小心为x输入有效的列。

weighted.sd2 <- function (x, wt, na.rm = TRUE) {

  x <- as.numeric(x)

  if (na.rm) {
    is_na <- is.na(x)

    x <- x[!is_na]
    wt <- wt[!is_na]
  }

  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}
weighted.sd2(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
# [1] 26.76086
weighted.sd2(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here
# [1] 116.545

To apply this to all columns, you can use apply() .要将其应用于所有列,您可以使用apply()

mtcars$weighted.sd <- apply(mtcars[,1:11], 1, weighted.sd2, wt = rep(11, 11))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb weighted.sd
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46 NA  1    4    4    52.61200
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02 NA  1    4    4    52.58011
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    37.06108
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1 NA    3    1    78.36300
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02 NA NA    3    2   116.54503
...

If you do a CTRL+click on weigted.sd function you can see the source code:如果你按 CTRL+点击weigted.sd函数,你可以看到源代码:

function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

When you run it, value vector contain values without NA's and it is reduced.当您运行它时,值向量包含没有 NA 的值并且它会减少。 But the weigth vector has the same length as before, resulting in an error.但是权重向量的长度与之前相同,导致错误。

A solution would be:一个解决方案是:

weighted.sd(mtcars[5,!is.na(mtcars[5,1:11])], 
c(11,11,11,11,11,11,11,11,11,11,11)[!is.na(mtcars[5,1:11])], na.rm = TRUE)

It's not elegant... But it does the job!它并不优雅......但它可以完成工作!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM