简体   繁体   English

R Data.Table with weights

[英]R Data.Table With Weights

library(data.table)
data = data.table("STUDENT" = c(1:100),
                  "SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))

dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
                                            "LABEL3" = c(1,2,3,1,2,3),
                                            "CAT_N" = NA,
                                            "CAT_PER" = NA,
                                            "FOX_N" = NA,
                                            "FOX_PER" = NA,
                                            "DOG_N" = NA,
                                            "DOG_PER" = NA,
                                            "MOUSE_N" = NA,
                                            "MOUSE_PER" = NA,
                                            "BIRD_N" = NA,
                                            "BIRD_PER" = NA)

I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.我有一个 data.table 将其称为数据,并尝试像 dataWANT 中显示的那样总结学生数据。

in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.在 dataWANT 中,末尾有 _N 的列只是每个 LABEL1 和 LABEL3 组合等于 1 的列中值的计数,因此总共 6 组。

in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.在 dataWANT 中,末尾有 _PER 的列是在其列中有 _PER 的组的加权比例。

An option using data.table would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols , get the sum (as it binary columns) by looping over the .SD and concatenate with the weighted.mean based on the 'SAMPLEWEIGHT' column使用data.table的一个选项是按“LABEL1”、“LABEL3”分组,在.SDcols中指定感兴趣的列,通过循环.SD获得sum (因为它是二进制列)并与weighted.mean基于'SAMPLEWEIGHT' 列

library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")), 
  setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)), 
     paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM