简体   繁体   中英

Rowwise weighted.mean in dplyr using weights from another data.frame

I have a data.frame with columns from different groups (here, a and b ), and another data.frame containing the weights to perform a weighted mean:

test = data.frame(a.1=rep(2,5), b.1=rep(3,5), a.2=6:10, b.2=11:15)
tweights = data.frame(name=c('a.1', 'b.1', 'a.2', 'b.2'), 
                     w=c(0.2, 0.33, 0.8, 0.67))

For each line in test , I wand to perform a weighted mean for the columns containing a with the weights given by their corresponding value in tweights , and the same for the columns with b .

What I tried to do:

test %>% rowwise() %>% 
  mutate(awmean = weighted.mean(c(a.1, a.2), 
                                tweights$w[grepl('a', tweights$name)]),
         bwmean = weighted.mean(c(b.1, b.2), 
                                tweights$w[grepl('b', tweights$name)]))

This is working fine but this is not efficient nor elegant, I woud like to avoid explicitely mentioning the column names ( a.1 , a.2 etc), and the second part invoking grepl doesn't look very clean to me neither...

I tried something like this but it is wrong:

test %>% rowwise() %>%
  mutate(awmean = weighted.mean(contains('a'),
                                tweights$w[grepl('a', tweights$name)]))

Error: error in evaluating the argument 'x' in selecting a method 
for function 'weighted.mean': Error: could not find function "contains"

Note that I assume here that the order of the columns a.1 : an and the order of the corresponding lines in tweights is the same, which can be OK. A solution really taking care of the matching between the values and weights in weighted.mean would be even better...

Maybe a custom function?

# get weighted means, for names beginning with a certain letter
getWM <- function(letter1) {
  rgx <- paste0('^', letter1)
  apply(test[, grep(rgx, names(test))], 1, weighted.mean,
        w = subset(tweights, grepl(rgx, name))$w )
}

Now you can just make a call like:

getWM('a')
[1] 5.2 6.0 6.8 7.6 8.4

Or, for all letters:

first_letters <- unique(gsub('[^a-z]', '', names(test)))
sapply(first_letters, getWM)

       a     b
[1,] 5.2  8.36
[2,] 6.0  9.03
[3,] 6.8  9.70
[4,] 7.6 10.37
[5,] 8.4 11.04

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM