我可以计算行或列的最小值或最大值，我可以计算列的平均值，但我无法计算行的平均值。为什么不？

Question

Given a simple 2x2 data frame, I can calculate the min or max of a row or column and I can calculate the mean of a column, but I can't calculate the mean of a row. 给定一个简单的2x2数据帧，我可以计算行或列的最小值或最大值，我可以计算列的平均值，但我无法计算行的平均值。 Why not? 为什么不？

> dat <- data.frame( A=c(1,2),B=c(3,4))
> dat
  A B
1 1 3
2 2 4
> min(dat[1,])
[1] 1
> max(dat[1,])
[1] 3
> mean(dat[,1])
[1] 1.5
> mean(dat[1,])
[1] NA
Warning message:
In mean.default(dat[1, ]) :
  argument is not numeric or logical: returning NA

Answer 1

max and min accept multiple vectors as parameters, and calculate the maximum/minimum in all of them. max和min接受多个向量作为参数，并计算所有向量的最大值/最小值。

mean is more limited, it takes a single argument of a supported type. mean更有限，它需要一个支持类型的单个参数。 For example vector is a supported type. 例如，vector是受支持的类型。

For more details see ?max and ?mean , especially the Usage , Arguments , and Details sections. 有关更多详细信息，请参阅?max和?mean ，尤其是Usage ， Arguments和Details部分。

The type of dat is data.frame . dat的类型是data.frame 。 And so is the type of dat[1,] , because a row of a data frame is also a data frame, with a single value in each of its columns. dat[1,]的类型也是如此，因为数据帧的一行也是一个数据帧，每列中都有一个值。

When you pass a data frame to max , it operates on the columns (vectors) of the data frame, returning the maximum value of all of them. 将数据帧传递给max ，它会对数据框的列（向量）进行操作，并返回所有数据框的最大值。

When you pass a data frame to mean , it gives you an error because data frame is not one of the supported types. 当您将数据帧传递给mean ，它会给您一个错误，因为数据帧不是受支持的类型之一。

You can use unlist to get a vector from a data frame. 您可以使用unlist从数据框中获取向量。 It does that practically by concatenating all the vectors of the data frame. 它实际上是通过连接数据帧的所有向量来实现的。 For example unlist(dat) will return the vector 1 2 3 4 . 例如， unlist(dat)将返回向量1 2 3 4 。 dat[1,] is the first row of dat , which has vectors 1 and 3 , so unlist(dat[1,]) will return the vector 1 2 . dat[1,]是第一行dat ，它有向量1和3 ，因此unlist(dat[1,])将返回向量1 2 。 You can call mean on that. 你可以打电话给那个mean 。

Answer 2

If all of your columns are numeric, you can just use rowMeans(dat) . 如果所有列都是数字，则可以使用rowMeans(dat) 。 To compactly select the numeric ones, you could do (for example) rowMeans(iris[, 1:4]) . 要紧凑地选择数字，你可以（例如） rowMeans(iris[, 1:4]) 。

If you don't want to have to worry about identifying which columns are numeric, you could also use sapply() to generate logical column indices for subsetting: 如果您不想担心识别哪些列是数字，您还可以使用sapply()生成用于子集化的逻辑列索引：

rowMeans(iris[, sapply(iris, is.numeric)])

Note also that rowMeans() has an na.rm parameter, which you can set to TRUE if you think your data might have missing values. 另请注意， rowMeans()具有na.rm参数，如果您认为数据可能缺少值，则可以将其设置为TRUE 。

Answer 3

Adding to lefft's amswer, you don't need to know the numeric columns, and can use Filter to find them. 添加到lefft的amswer，您不需要知道数字列，并可以使用Filter来查找它们。

rowMeans(Filter(is.numeric,dat),na.rm=T)

will do the trick. 会做的。 That being said, if you know the columns, is.numeric and Filter in conjuction are a lot slower than simply listing out the columns. 话虽这么说，如果你知道列， is.numeric和Filter in conjuction比简单地列出列慢很多。

EDIT 编辑

Sorry, I wished I could have left that as a comment to the previous answer, as I thought it was useful clarification, but had no other way of posting. 对不起，我希望我可以将其作为对前一个答案的评论，因为我认为这是有用的澄清，但没有其他方式发布。 To give it a little more info about the overhead, I ran a micro benchmark on the ways of grabbing the numeric columns: 为了给它提供更多关于开销的信息，我在抓取数字列的方式上运行了一个微基准：

library(microbenchmark)
df.mb<-data.frame(
  c(runif(10000)),c(runif(10000)),c(runif(10000)),
  c(rep("A",10000)),c(rep("A",10000)),c(rep("A",10000)),
  c(rep("A",10000)),c(rep("A",10000)),c(rep("A",10000)))
names(df.mb)<-c("a","b","c","d","e","f","g","h","i")


function1<-function(x) {rowMeans(Filter(is.numeric,x))}
function2<-function(x) {rowMeans(x[,1:3])}
function3<-function(x) {rowMeans(x[,c("a","b","c")])}
function4<-function(x) {rowMeans(x[ ,sapply(x,is.numeric)])}

microbenchmark(
  function1(df.mb),
  function2(df.mb),
  function3(df.mb),
  function4(df.mb)
)

Unit: microseconds
         expr     min       lq     mean   median       uq       max neval cld
 function1(df.mb) 351.148 372.4810 768.2310 464.0005 492.5875 16216.321   100   a
 function2(df.mb) 317.441 338.5605 667.6871 429.6545 442.0270 15281.921   100   a
 function3(df.mb) 317.867 340.4810 581.0908 421.1205 439.0410  8965.121   100   a
 function4(df.mb) 363.521 385.2810 735.4673 461.6535 519.2545 15701.334   100   a

As long as you know the columns by name and number, you are faster, but barring that either Filter or sapply will help. 只要您按名称和数字知道列，就会更快，但除非Filter或sapply会有所帮助。

我可以计算行或列的最小值或最大值，我可以计算列的平均值，但我无法计算行的平均值。为什么不？

问题描述

3 个解决方案

解决方案1
5 2017-11-07 20:34:50

解决方案2
3 2017-11-07 20:33:33

解决方案3
1 2017-11-07 21:10:08

我可以计算行或列的最小值或最大值，我可以计算列的平均值，但我无法计算行的平均值。 为什么不？

问题描述

3 个解决方案

解决方案1 5 2017-11-07 20:34:50

解决方案2 3 2017-11-07 20:33:33

解决方案3 1 2017-11-07 21:10:08

我可以计算行或列的最小值或最大值，我可以计算列的平均值，但我无法计算行的平均值。为什么不？

解决方案1
5 2017-11-07 20:34:50

解决方案2
3 2017-11-07 20:33:33

解决方案3
1 2017-11-07 21:10:08