[英]R data.table Subgroup counts and weighted percent of group summary
我有以下数据表
n = 100000
DT = data.table(customer_ID = 1:n,
married = rbinom(n, 1, 0.4),
coupon = rbinom(n, 1, 0.15))
我需要创建一个表,该表汇总已婚和未婚客户的总数,使用优惠券的顾客数量(按婚姻状况子组),以及最后一列,用于按婚姻状况计算每个子组使用优惠券的顾客所占的百分比。
输出应如下所示。
married Customers using Coupons Total Customers percent_usecoupon
1: 0 9036 59790 15.11290
2: 1 5943 40210 14.77991
我当前的代码效率很低,我敢肯定使用data.table会有更好的语法,但是我似乎找不到它。 我在下面复制了当前代码:
coupon_marital = DT[coupon == TRUE, .N, by = married][order(-N)] #Count of coupon use by marital status
total_marital = DT[, .N, by = married] #Total count by marital status
setnames(total_marital, "N", "Count") #Rename N to Count
coupon_marital = merge(coupon_marital, total_marital) #Merge data.tables
coupon_marital[, percent_usecoupon := N/Count*100, by = married] #Compute percentage coupon use
setnames(coupon_marital, c("N", "Count"), c("Customers using Coupons", "Total Customers")) #Rename N to Count
rm(total_marital)
print(coupon_marital)
我不能使用dplyr,而只需要使用data.table。 我对data.table语法非常陌生,非常感谢您的帮助!
建立资料
set.seed(10)
n = 100000
DT = data.table(customer_ID = 1:n,
married = rbinom(n, 1, 0.4),
coupon = rbinom(n, 1, 0.15))
汇总数据
DT[, .(N.UseCoupon = sum(coupon)
,N.Total = .N
,Pct.UseCoupon = 100*mean(coupon)),
by = married]
# married N.UseCoupon N.Total Pct.UseCoupon
# 1: 0 8975 60223 14.90294
# 2: 1 5904 39777 14.84275
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.