简体   繁体   中英

How to get mean, median, and other statistics over entire matrix, array or dataframe?

I know this is a basic question but for some strange reason I am unable to find an answer.

How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns

Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median .

  1. For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?

  2. For a data.frame , mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist . As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist . In general for a data.frame if you want something to act on all values, you generally will just unlist it first.

Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:

o   mean() for data frames and sd() for data frames and matrices are
defunct.

By default, mean and median etc work over an entire array or matrix.

Eg:

# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.

# matrix:
mean(as.matrix(m)) # same as before

For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):

# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.

Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.

You can use library dplyr via install.packages('dplyr') and then

dataframe.mean <- dataframe %>%
  summarise_all(mean) # replace for median

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM