R Data.Table with weights

Question

library(data.table)
data = data.table("STUDENT" = c(1:100),
                  "SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))

dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
                                            "LABEL3" = c(1,2,3,1,2,3),
                                            "CAT_N" = NA,
                                            "CAT_PER" = NA,
                                            "FOX_N" = NA,
                                            "FOX_PER" = NA,
                                            "DOG_N" = NA,
                                            "DOG_PER" = NA,
                                            "MOUSE_N" = NA,
                                            "MOUSE_PER" = NA,
                                            "BIRD_N" = NA,
                                            "BIRD_PER" = NA)

I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.我有一个 data.table 将其称为数据，并尝试像 dataWANT 中显示的那样总结学生数据。

in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.在 dataWANT 中，末尾有 _N 的列只是每个 LABEL1 和 LABEL3 组合等于 1 的列中值的计数，因此总共 6 组。

in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.在 dataWANT 中，末尾有 _PER 的列是在其列中有 _PER 的组的加权比例。

Answer 1

An option using data.table would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols , get the sum (as it binary columns) by looping over the .SD and concatenate with the weighted.mean based on the 'SAMPLEWEIGHT' column使用data.table的一个选项是按“LABEL1”、“LABEL3”分组，在.SDcols中指定感兴趣的列，通过循环.SD获得sum （因为它是二进制列）并与weighted.mean基于'SAMPLEWEIGHT' 列

library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")), 
  setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)), 
     paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]

R Data.Table with weights

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-05 19:32:04

R Data.Table with weights

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-05 19:32:04

解决方案1
1 已采纳 2020-05-05 19:32:04