简体   繁体   中英

Weighted average value in the presence of NA values

Here's a very simple example of what I'm dealing with:

data_stack <- data.table(CompA_value = c(10,20,30,40), CompB_value = c(60,70,80,80), CompC_value = c(NA, NA, NA, 100), CompA_weight = c(0.2, 0.3,0.4,0.4), CompB_weight = c(0.8,0.7,0.6,0.4), CompC_weight = c(NA, NA, NA,0.2))

   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight
1:          10          60          NA          0.2          0.8           NA
2:          20          70          NA          0.3          0.7           NA
3:          30          80          NA          0.4          0.6           NA
4:          40          80         100          0.4          0.4          0.2

What I want to do is calculate the weighted average of CompA through C, for each row. However, notice that CompC has NAs for rows 1-3. What I would like is for rows 1-3 to have the weighted average of CompA and CompB, but once CompC becomes active, I'd like to have it automatically included in the calculation.

As it stands, I've done something like this:

> data_stack[, Weighted_average := CompA_value*CompA_weight + CompB_value*CompB_weight + CompC_value * CompC_weight]
> data_stack
   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1:          10          60          NA          0.2          0.8           NA               NA
2:          20          70          NA          0.3          0.7           NA               NA
3:          30          80          NA          0.4          0.6           NA               NA
4:          40          80         100          0.4          0.4          0.2               68

But my "Weighted_average" column obviously won't give me weights for the first 1-3 rows.

What I want is:

 data_stack[, Weighted_average := c((10*0.2 + 60*0.8),(20*0.3 + 70*0.7),(30*0.4 + 80*0.6),(40*0.4 + 80*0.4 + 100*0.2))]
 data_stack
   CompA_value CompB_value CompC_value CompA_weight   CompB_weight CompC_weight   Weighted_average
1:          10          60          NA          0.2          0.8           NA               50
2:          20          70          NA          0.3          0.7           NA               55
3:          30          80          NA          0.4          0.6           NA               60
4:          40          80         100          0.4          0.4          0.2               68

So, note how the first three rows are just the weighted average of A and B, but once C becomes available, it is also included into the calculation.

So I'd like to find out how to write some code which picks up whether there is an NA value, if so, skips it, but if not includes it in the calculation.

I've got a considerably bigger data table so doing it manually is out of the question!

Regards.

Here you go:

data_stack$Weighted_average = apply(data_stack,1,function(x){
  y = c(x["CompA_value"]*x["CompA_weight"],
        x["CompB_value"]*x["CompB_weight"],
        x["CompC_value"]*x["CompC_weight"])
  return(sum(y,na.rm = T))
})

Result:

> data_stack
  CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1          10          60          NA          0.2          0.8           NA               50
2          20          70          NA          0.3          0.7           NA               55
3          30          80          NA          0.4          0.6           NA               60
4          40          80         100          0.4          0.4          0.2               68

The function creates a vector with value*weight for each column. Then returns the sum ignoring the NA values. This means that this will ignore NA values in any column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM