![](/img/trans.png)
[英]R- How to get mean/median/sd of non-NA intervals for different columns?
[英]Aggregate by group and get count, mean and sd of non-NA values for different data.frame columns
我在通过下面的函数按组计算非缺失值时遇到了一些困难(该函数还会给出sd和均值):
test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x))))
最终给了我数据帧中所有列而不是单个列的不丢失数。
我一直在寻找SO的一些建议,并发现了this , this和this很有帮助,但是我无法弄清楚为什么带有function(x)的聚合会为sum(!is.na(x)合并一些列,但不是平均值或sd。
编辑:添加表
您会注意到,在“具有”数据框中,按处理组对var1列中不存在的行进行计数将得出以下结果:
veh-9 gr.4-8 gr.3-10 gr.2-5
但是当使用sum(!is.na(x)时,我得到以下内容
veh-6 gr.4-5 gr.3-10 gr.2-5
我相信这是因为该函数同时使用var1和var2来求和非缺失数。 我不知道该如何纠正。
最好,
插口
这是一个data.table
方法:
数据
您拥有的数据难以读入R中-请使用dput()
等使其他数据更容易使用:
> dput(dt)
structure(list(someting = c("503", "553", "599", "647", "695",
"728", "760", "793", "826", "859", "907", "955", "1003", "1036",
"1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402",
"1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2",
"gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8,
NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8,
NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA,
NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L,
NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA,
-25L), class = c("data.table", "data.frame"))
码
dt[, .(var1.n = sum(!is.na(var1)),
var2.n = sum(!is.na(var1)),
var1.mean = mean(var1, na.rm = T),
var2.mean = mean(var2, na.rm = T)),
by = .(treatment)]
OUTPUT
treatment var1.n var2.n var1.mean var2.mean
1: gr.2 5 5 6 8
2: gr.3 10 10 8 8
3: gr.4 1 1 8 8
由于某些原因,未读入“ veh”条目。因此输出略有不同,但原理应明确。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.