简体   繁体   中英

Why does na.rm=TRUE not work for weighted SD in R?

I have a dataframe of 10 columns with house prices, that in some cases, includes NAs. I want to create a new column of weighted sd but for the rows that have a few NAs, I get the following error:

Error in e2[[j]] : subscript out of bounds

What I use per row (and works for rows without NAs):

weighted.sd(my.df[40,2:10], c(9,9,9,9,9,9,9,9,9), na.rm = TRUE)

Example

library(radiant.data)
data("mtcars")
mtcars[mtcars == 0] <- NA
weighted.sd(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
weighted.sd(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here

What is the problem here and how can I create a new column with the weighted SD per row?

The problem appears to be that weighted.sd() will not operate as you are expecting across rows of a data frame.

Running weighted.sd we can see the code:

weighted.sd <- function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

In your example, you are not feeding in a vector for x , but rather a single row of a data frame. Function na.omit(x) will remove that entire row, due to the NA values - not elements of the vector.

You can try to convert the row to a vector with as.numeric() , but that will fail for this function as well due to how NA is removed from wt .

It seems like something like this is probably what you want. Of course, you have to be careful that you are feeding in valid columns for x .

weighted.sd2 <- function (x, wt, na.rm = TRUE) {

  x <- as.numeric(x)

  if (na.rm) {
    is_na <- is.na(x)

    x <- x[!is_na]
    wt <- wt[!is_na]
  }

  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}
weighted.sd2(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
# [1] 26.76086
weighted.sd2(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here
# [1] 116.545

To apply this to all columns, you can use apply() .

mtcars$weighted.sd <- apply(mtcars[,1:11], 1, weighted.sd2, wt = rep(11, 11))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb weighted.sd
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46 NA  1    4    4    52.61200
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02 NA  1    4    4    52.58011
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    37.06108
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1 NA    3    1    78.36300
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02 NA NA    3    2   116.54503
...

If you do a CTRL+click on weigted.sd function you can see the source code:

function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

When you run it, value vector contain values without NA's and it is reduced. But the weigth vector has the same length as before, resulting in an error.

A solution would be:

weighted.sd(mtcars[5,!is.na(mtcars[5,1:11])], 
c(11,11,11,11,11,11,11,11,11,11,11)[!is.na(mtcars[5,1:11])], na.rm = TRUE)

It's not elegant... But it does the job!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM