计算 data.frame 中多列的平均值

[英]calculate mean for multiple columns in data.frame

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function只是想知道是否可以仅使用 mean 函数来计算多列的均值



is possible but not有可能但不是




got this error message:收到此错误消息:

Warning message: In mean.default(iris[, 1:4]) : argument is not numeric or logical: returning NA警告消息:在 mean.default(iris[, 1:4]) 中:参数不是数字或逻辑:返回 NA

I know I can just use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)我知道我可以只使用 lapply(iris[,1:4],mean) 或 sapply(iris[,1:4],mean)

Try colMeans :尝试colMeans

But the column must be numeric.但该列必须是数字。 You can add a test for it for larger datasets.您可以为更大的数据集添加测试。

colMeans(iris[sapply(iris, is.numeric)])
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 


Seems long for dplyr and data.table . dplyrdata.table似乎很长。 Perhaps someone can replicate the findings for veracity.也许有人可以复制这些发现的准确性。

  plafort = colMeans(big.df[sapply(big.df, is.numeric)]),
  Carlos  = colMeans(Filter(is.numeric, big.df)),
  Cdtable = big.dt[, lapply(.SD, mean)],
  Cdplyr  = big.df %>% summarise_each(funs(mean))
#Unit: milliseconds
#    expr       min        lq     mean    median       uq       max
# plafort  9.862934 10.506778 12.07027 10.699616 11.16404  31.23927
#  Carlos  9.215143  9.557987 11.30063  9.843197 10.21821  65.21379
# Cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453
#  Cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578


m <- matrix(1:1e6, 1000)
m2 <- matrix(rep('a', 1000), ncol=1)
big.df <- as.data.frame(cbind(m2, m), stringsAsFactors=F)
big.df[,-1] <- lapply(big.df[,-1], as.numeric)
big.dt <- as.data.table(big.df)

With sapply + Filter :使用sapply + Filter

sapply(Filter(is.numeric, iris), mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 

With dplyr :使用dplyr

iris %>% summarise_each(funs(mean))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

PS: in dplyr you can now use summarize_if , PS:在dplyr您现在可以使用summarize_if

iris %>% summarise_if(is.numeric, mean)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1     5.843333    3.057333        3.758    1.199333

With data.table :使用data.table

iris <- data.table(iris)
iris[,lapply(.SD, mean)]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

Your above solution does work assuming the columns are in the correct is.numeric format.假设列采用正确的 is.numeric 格式,您的上述解决方案确实有效。 See below example:见下面的例子:

a <- c(1,2,3)

b <- c(2,4,6)

d <- c(3,6,9)

mydata <- cbind(b,a,d)


