[英]R Data.Table With Weights
library(data.table)
data = data.table("STUDENT" = c(1:100),
"SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))
dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
"LABEL3" = c(1,2,3,1,2,3),
"CAT_N" = NA,
"CAT_PER" = NA,
"FOX_N" = NA,
"FOX_PER" = NA,
"DOG_N" = NA,
"DOG_PER" = NA,
"MOUSE_N" = NA,
"MOUSE_PER" = NA,
"BIRD_N" = NA,
"BIRD_PER" = NA)
I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.我有一个 data.table 将其称为数据,并尝试像 dataWANT 中显示的那样总结学生数据。
in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.在 dataWANT 中,末尾有 _N 的列只是每个 LABEL1 和 LABEL3 组合等于 1 的列中值的计数,因此总共 6 组。
in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.在 dataWANT 中,末尾有 _PER 的列是在其列中有 _PER 的组的加权比例。
An option using data.table
would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols
, get the sum
(as it binary columns) by looping over the .SD
and concatenate with the weighted.mean
based on the 'SAMPLEWEIGHT' column使用data.table
的一个选项是按“LABEL1”、“LABEL3”分组,在.SDcols
中指定感兴趣的列,通过循环.SD
获得sum
(因为它是二进制列)并与weighted.mean
基于'SAMPLEWEIGHT' 列
library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")),
setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)),
paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.