计算 data.frame 中多列的平均值

Question

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function只是想知道是否可以仅使用 mean 函数来计算多列的均值

eg例如

mean(iris[,1])

is possible but not有可能但不是

mean(iris[,1:4])

tried:试过：

mean(iris[,c(1:4)])

got this error message:收到此错误消息：

Warning message: In mean.default(iris[, 1:4]) : argument is not numeric or logical: returning NA警告消息：在 mean.default(iris[, 1:4]) 中：参数不是数字或逻辑：返回 NA

I know I can just use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)我知道我可以只使用 lapply(iris[,1:4],mean) 或 sapply(iris[,1:4],mean)

Answer 1

Try colMeans :尝试colMeans ：

But the column must be numeric.但该列必须是数字。 You can add a test for it for larger datasets.您可以为更大的数据集添加测试。

colMeans(iris[sapply(iris, is.numeric)])
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333

Benchmark基准

Seems long for dplyr and data.table . dplyr和data.table似乎很长。 Perhaps someone can replicate the findings for veracity.也许有人可以复制这些发现的准确性。

microbenchmark(
  plafort = colMeans(big.df[sapply(big.df, is.numeric)]),
  Carlos  = colMeans(Filter(is.numeric, big.df)),
  Cdtable = big.dt[, lapply(.SD, mean)],
  Cdplyr  = big.df %>% summarise_each(funs(mean))
  )
#Unit: milliseconds
#    expr       min        lq     mean    median       uq       max
# plafort  9.862934 10.506778 12.07027 10.699616 11.16404  31.23927
#  Carlos  9.215143  9.557987 11.30063  9.843197 10.21821  65.21379
# Cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453
#  Cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578

Data数据

m <- matrix(1:1e6, 1000)
m2 <- matrix(rep('a', 1000), ncol=1)
big.df <- as.data.frame(cbind(m2, m), stringsAsFactors=F)
big.df[,-1] <- lapply(big.df[,-1], as.numeric)
big.dt <- as.data.table(big.df)

Answer 2

With sapply + Filter :使用sapply + Filter ：

sapply(Filter(is.numeric, iris), mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333

With dplyr :使用dplyr ：

library(dplyr)
iris %>% summarise_each(funs(mean))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

PS: in dplyr you can now use summarize_if , PS：在dplyr您现在可以使用summarize_if ，

iris %>% summarise_if(is.numeric, mean)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1     5.843333    3.057333        3.758    1.199333

With data.table :使用data.table ：

library(data.table)
iris <- data.table(iris)
iris[,lapply(.SD, mean)]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

Answer 3

Your above solution does work assuming the columns are in the correct is.numeric format.假设列采用正确的 is.numeric 格式，您的上述解决方案确实有效。 See below example:见下面的例子：

a <- c(1,2,3)
mean(a)

b <- c(2,4,6)
mean(b)

d <- c(3,6,9)

mydata <- cbind(b,a,d)


mean(mydata[,1:3])

计算 data.frame 中多列的平均值

问题描述

3 个解决方案

解决方案1
13 已采纳 2015-06-19 15:12:24

解决方案2
8 2015-06-19 15:15:33

解决方案3
0 2015-06-19 15:17:22

计算 data.frame 中多列的平均值

问题描述

3 个解决方案

解决方案1 13 已采纳 2015-06-19 15:12:24

解决方案2 8 2015-06-19 15:15:33

解决方案3 0 2015-06-19 15:17:22

解决方案1
13 已采纳 2015-06-19 15:12:24

解决方案2
8 2015-06-19 15:15:33

解决方案3
0 2015-06-19 15:17:22