简体   繁体   中英

Calculating a weighted mean in data.table in R varying weights

My question relates to this previously asked question:
Calculating a weighted mean using data.table in R with weights in one of the table columns
In my case, I have different weights-columns across the columns I want to aggregate. Let's say I have four columns col_a, col_b, col_c and col_d and let's assume I want to aggregate col_a and col_b with weiths w_1 and col_c , col_d with w_2 . Example:

require(data.table)
id <- c(1,1,1,2,2,2)
col_a <- c(123,56,87,987,1003,10)
col_b <- c(17,234,20,88,765,69)
col_c <- c(45,90,543,30,1,543)
col_d <- c(60,43,700,3,88,46)
w_1   <- c(1,1,1,1,1,1)
w_2   <- c(1.5,1,1.2,0.8,1,1)
dt <- data.table(id, col_a, col_b, col_c, col_d, w_1, w_2);dt

Now the desired result would look like this:

data.table(id=c(1,2),col_a=c(weighted.mean(col_a[1:3],w_1[1:3]),weighted.mean(col_a[4:6],w_1[4:6])),col_b=c(weighted.mean(col_b[1:3],w_1[1:3]),weighted.mean(col_b[4:6],w_1[4:6])),
       col_c=c(weighted.mean(col_c[1:3],w_2[1:3]),weighted.mean(col_c[4:6],w_1[4:6])),col_d=c(weighted.mean(col_d[1:3],w_2[1:3]),weighted.mean(col_d[4:6],w_2[4:6])))

This, I thought could be accomplished similar to @akrun answer to this post:
R collapse multiple rows into 1 row using specific function to each column
where I would have the two functions weighted.mean(x, w_1) and weighted.mean(x, w_2) instead of min or median . Here is how far I got:

colsToKeep <- c("col_a","col_b","col_c","col_d")
dt[, Map(function(x,y) get(x)(y, na.rm = TRUE), 
         setNames(rep(c('weighted.mean', 'weighted.mean'),2),names(.SD)), .SD),.SDcols=colsToKeep, by = id]

My question: how can get the arguments w=w_1 and w=w_2 into the setNames -function? Is that even possible?

Could be something like this too:

colsToKeep <- c("col_a", "col_b", "col_c", "col_d")
colsToW <- c("w_1", "w_1", "w_2", "w_2")

eval(parse(text = paste0("dt[, .(", paste0("w_", colsToKeep, " = weighted.mean(", colsToKeep, ",", colsToW, ")", collapse = ", "), "), by = id]")))

or this one:

dt[, Map(function(x,y,w) get(x)(y, w, na.rm = TRUE), 
         setNames(rep('weighted.mean',length(colsToKeep)), paste0("W_", colsToKeep)),
         .SD[, ..colsToKeep], .SD[, ..colsToW]),
   by = id]  

As mentioned by Roland, you can cast into a long format. The benefit is that in the long run, you do not have to change the codes every time when there is a new column. Explanation in line. You can print mdt to take a look.

#cast into a long format with col values in 1 column and rows in another columns
mdt <- melt(dt, id.var=c("id",grep("^w", names(dt), value=TRUE)), 
    variable.name="col", value.name="colVal")
mdt <- melt(mdt, id.var=c("id","col","colVal"), 
    variable.name="w", value.name="wVal")

#prob need to think of a programmatic way rather than typing columns
myPairs <- data.table(rbind(
    c(col="col_a", w="w_1"), 
    c(col="col_b", w="w_1"), 
    c(col="col_c", w="w_2"), 
    c(col="col_d", w="w_2")))

#calculate weighted mean according to myPairs and then pivot the table
dcast(mdt[myPairs, on=.(col, w),
        weighted.mean(colVal, wVal), 
        by=.(id, col)], 
    id ~ col, 
    value.var="V1")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM