[英]summary statistics of multiple data frames within a list
If I have this list 如果我有这个清单
set.seed(123)
thelist <- list(a=data.frame(x1=rnorm(10), x2=rnorm(10)),
b=data.frame(x1=rnorm(10), x2=rnorm(10)),
c=data.frame(x1=rnorm(10), x2=rnorm(10)))
And wanted to calculate the mean of each column within each list I could do so with the following code. 并且想要使用以下代码计算每个列表中每列的平均值。
sapply(do.call("rbind",thelist),mean)
How could I calculate the standard deviation, again for each column within each list (a:c), as there is no equivalent function for sd (at least to my knowledge)? 我怎样才能为每个列表中的每一列(a:c)计算标准偏差,因为sd没有等效函数(至少据我所知)?
Any suggests would be appreciated. 任何建议将不胜感激。
A basic R solution would be using sapply
twice. 一个基本的R解决方案是使用
sapply
两次。
For mean only it is: 仅意味着它是:
t(sapply(thelist, sapply, mean))
Resulting in 导致
x1 x2
a 0.074625644 0.2086220
b -0.424558873 0.3220446
c -0.008715537 0.2216860
If you want both: 如果你想要两个:
my_summary <- function(x){
c(mean = mean(x), sd = sd(x))
}
as.data.frame(lapply(thelist, sapply, my_summary))
Resulting in: 导致:
a.x1 a.x2 b.x1 b.x2 c.x1 c.x2
mean 0.07462564 0.208622 -0.4245589 0.3220446 -0.008715537 0.2216860
sd 0.95378405 1.038073 0.9308092 0.5273024 1.082518163 0.8564451
First, I'd make it stackable by making the name into a column: 首先,我通过将名称放入一列来使其可堆叠:
for (i in seq_along(thelist)) thelist[[i]]$dfname <- names(thelist)[i]
Then, stack and take means with data.table
: 然后,使用
data.table
堆栈并获取方法:
require(data.table)
DT <- rbindlist(thelist)
DT[,lapply(.SD,mean),by=dfname]
which gives 这使
dfname x1 x2
1: a 0.074625644 0.2086220
2: b -0.424558873 0.3220446
3: c -0.008715537 0.2216860
You might also consider the summary
function, though it's clunky here: 您可能还会考虑
summary
功能,尽管它很笨拙:
DT[,as.list(unlist(lapply(.SD,summary))),by=dfname]
# dfname x1.Min. x1.1st Qu. x1.Median x1.Mean x1.3rd Qu. x1.Max. x2.Min. x2.1st Qu. x2.Median x2.Mean x2.3rd Qu. x2.Max.
# 1: a -1.265 -0.5318 -0.07983 0.074630 0.37800 1.715 -1.9670 -0.32690 0.3803 0.2086 0.6505 1.7870
# 2: b -1.687 -1.0570 -0.67700 -0.424600 0.06054 1.254 -0.3805 -0.23680 0.4902 0.3220 0.7883 0.8951
# 3: c -1.265 -0.6377 -0.30540 -0.008716 0.56410 2.169 -1.5490 -0.03929 0.1699 0.2217 0.5018 1.5160
Finally, copying my old answer , you could make your own summary-stats function: 最后,复制我的旧答案 ,您可以制作自己的摘要统计功能:
summaryfun <- function(x) list(mean=mean(x),sd=sd(x))
DT[,as.list(unlist(lapply(.SD,summaryfun))),by=dfname]
# dfname x1.mean x1.sd x2.mean x2.sd
# 1: a 0.074625644 0.9537841 0.2086220 1.0380734
# 2: b -0.424558873 0.9308092 0.3220446 0.5273024
# 3: c -0.008715537 1.0825182 0.2216860 0.8564451
You can combine your data as you proposed yourself and then aggregate as follows: 您可以按照自己的建议合并数据,然后按如下方式进行汇总:
thelist_named <- Map(cbind, thelist, nam = names(thelist))
thelist_binded <- do.call(rbind, thelist_named)
Aggregation Part: 聚合部分:
my_summary <- function(x){
c(mean = mean(x), sd = sd(x))
}
aggregate(.~nam, thelist_binded, my_summary)
Result: 结果:
nam x1.mean x1.sd x2.mean x2.sd
1 a 0.074625644 0.953784051 0.2086220 1.0380734
2 b -0.424558873 0.930809213 0.3220446 0.5273024
3 c -0.008715537 1.082518163 0.2216860 0.8564451
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.