[英]I can calculate the min or max of a row or column and I can calculate the mean of a column, but I can't calculate the mean of a row. Why not?
Given a simple 2x2 data frame, I can calculate the min or max of a row or column and I can calculate the mean of a column, but I can't calculate the mean of a row. 给定一个简单的2x2数据帧,我可以计算行或列的最小值或最大值,我可以计算列的平均值,但我无法计算行的平均值。 Why not?
为什么不?
> dat <- data.frame( A=c(1,2),B=c(3,4))
> dat
A B
1 1 3
2 2 4
> min(dat[1,])
[1] 1
> max(dat[1,])
[1] 3
> mean(dat[,1])
[1] 1.5
> mean(dat[1,])
[1] NA
Warning message:
In mean.default(dat[1, ]) :
argument is not numeric or logical: returning NA
max
and min
accept multiple vectors as parameters, and calculate the maximum/minimum in all of them. max
和min
接受多个向量作为参数,并计算所有向量的最大值/最小值。
mean
is more limited, it takes a single argument of a supported type. mean
更有限,它需要一个支持类型的单个参数。 For example vector is a supported type. 例如,vector是受支持的类型。
For more details see ?max
and ?mean
, especially the Usage , Arguments , and Details sections. 有关更多详细信息,请参阅
?max
和?mean
,尤其是Usage , Arguments和Details部分。
The type of dat
is data.frame
. dat
的类型是data.frame
。 And so is the type of dat[1,]
, because a row of a data frame is also a data frame, with a single value in each of its columns. dat[1,]
的类型也是如此,因为数据帧的一行也是一个数据帧,每列中都有一个值。
When you pass a data frame to max
, it operates on the columns (vectors) of the data frame, returning the maximum value of all of them. 将数据帧传递给
max
,它会对数据框的列(向量)进行操作,并返回所有数据框的最大值。
When you pass a data frame to mean
, it gives you an error because data frame is not one of the supported types. 当您将数据帧传递给
mean
,它会给您一个错误,因为数据帧不是受支持的类型之一。
You can use unlist
to get a vector from a data frame. 您可以使用
unlist
从数据框中获取向量。 It does that practically by concatenating all the vectors of the data frame. 它实际上是通过连接数据帧的所有向量来实现的。 For example
unlist(dat)
will return the vector 1 2 3 4
. 例如,
unlist(dat)
将返回向量1 2 3 4
。 dat[1,]
is the first row of dat
, which has vectors 1
and 3
, so unlist(dat[1,])
will return the vector 1 2
. dat[1,]
是第一行dat
,它有向量1
和3
,因此unlist(dat[1,])
将返回向量1 2
。 You can call mean
on that. 你可以打电话给那个
mean
。
If all of your columns are numeric, you can just use rowMeans(dat)
. 如果所有列都是数字,则可以使用
rowMeans(dat)
。 To compactly select the numeric ones, you could do (for example) rowMeans(iris[, 1:4])
. 要紧凑地选择数字,你可以(例如)
rowMeans(iris[, 1:4])
。
If you don't want to have to worry about identifying which columns are numeric, you could also use sapply()
to generate logical column indices for subsetting: 如果您不想担心识别哪些列是数字,您还可以使用
sapply()
生成用于子集化的逻辑列索引:
rowMeans(iris[, sapply(iris, is.numeric)])
Note also that rowMeans()
has an na.rm
parameter, which you can set to TRUE
if you think your data might have missing values. 另请注意,
rowMeans()
具有na.rm
参数,如果您认为数据可能缺少值,则可以将其设置为TRUE
。
Adding to lefft's amswer, you don't need to know the numeric columns, and can use Filter
to find them. 添加到lefft的amswer,您不需要知道数字列,并可以使用
Filter
来查找它们。
rowMeans(Filter(is.numeric,dat),na.rm=T)
will do the trick. 会做的。 That being said, if you know the columns,
is.numeric
and Filter
in conjuction are a lot slower than simply listing out the columns. 话虽这么说,如果你知道列,
is.numeric
和Filter
in conjuction比简单地列出列慢很多。
EDIT 编辑
Sorry, I wished I could have left that as a comment to the previous answer, as I thought it was useful clarification, but had no other way of posting. 对不起,我希望我可以将其作为对前一个答案的评论,因为我认为这是有用的澄清,但没有其他方式发布。 To give it a little more info about the overhead, I ran a micro benchmark on the ways of grabbing the numeric columns:
为了给它提供更多关于开销的信息,我在抓取数字列的方式上运行了一个微基准:
library(microbenchmark)
df.mb<-data.frame(
c(runif(10000)),c(runif(10000)),c(runif(10000)),
c(rep("A",10000)),c(rep("A",10000)),c(rep("A",10000)),
c(rep("A",10000)),c(rep("A",10000)),c(rep("A",10000)))
names(df.mb)<-c("a","b","c","d","e","f","g","h","i")
function1<-function(x) {rowMeans(Filter(is.numeric,x))}
function2<-function(x) {rowMeans(x[,1:3])}
function3<-function(x) {rowMeans(x[,c("a","b","c")])}
function4<-function(x) {rowMeans(x[ ,sapply(x,is.numeric)])}
microbenchmark(
function1(df.mb),
function2(df.mb),
function3(df.mb),
function4(df.mb)
)
Unit: microseconds
expr min lq mean median uq max neval cld
function1(df.mb) 351.148 372.4810 768.2310 464.0005 492.5875 16216.321 100 a
function2(df.mb) 317.441 338.5605 667.6871 429.6545 442.0270 15281.921 100 a
function3(df.mb) 317.867 340.4810 581.0908 421.1205 439.0410 8965.121 100 a
function4(df.mb) 363.521 385.2810 735.4673 461.6535 519.2545 15701.334 100 a
As long as you know the columns by name and number, you are faster, but barring that either Filter or sapply will help. 只要您按名称和数字知道列,就会更快,但除非Filter或sapply会有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.