[英]How to assign factors for each column and calculate rowmeans based on the factor level
I have a data frame which I want to calculate row-means according to certain conditions, 我有一个要根据某些条件计算行均值的数据框,
example: 例:
df=
rownames A1 A2 A3 B1 B2 B3
r1 1 2 3 4 5 6
r2 1 2 3 4 5 6
r3 1 2 3 4 5 6
treatment= factor(rep(c("A","B"),each=3))
I want to get the means of each row based on the factor level of treatment, data as below: 我想根据治疗的因素水平获得每一行的平均值,数据如下:
rownames A B
r1 2 5
r2 2 5
r3 2 5
Any thoughts on this? 有什么想法吗?
We can do this with melt
from data.table
. 我们可以做到这一点melt
从data.table
。 The measure
argument can take multiple patterns
to convert to 'long' format, grouped by 'rownames' and specifying the columns in .SDcols
, loop through the Subset of Data.table ( .SD
) and get the mean
measure
参数可以采用多种patterns
转换为“长”格式,按“行名”分组并指定.SDcols
的列, .SDcols
的子集( .SD
)并获取mean
library(data.table)
melt(setDT(df), measure = patterns("^A", "^B"), value.name = c('A', 'B'))[,
lapply(.SD, mean), rownames, .SDcols = A:B]
# rownames A B
#1: r1 2 5
#2: r2 2 5
#3: r3 2 5
NOTE: The output is a data.table
which can be converted to data.frame
( setDF
) as showed in the OP's output 注意:输出是一个data.table
,可以将其转换为data.frame
( setDF
),如OP的输出所示
Another option is split
from base R
另一个选择是从base R
split
sapply(split.default(df[-1], sub("\\d+", "", names(df)[-1])), rowMeans)
# A B
#[1,] 2 5
#[2,] 2 5
#[3,] 2 5
df <- structure(list(rownames = c("r1", "r2", "r3"), A1 = c(1L, 1L,
1L), A2 = c(2L, 2L, 2L), A3 = c(3L, 3L, 3L), B1 = c(4L, 4L, 4L
), B2 = c(5L, 5L, 5L), B3 = c(6L, 6L, 6L)), .Names = c("rownames",
"A1", "A2", "A3", "B1", "B2", "B3"), class = "data.frame", row.names = c(NA,
-3L))
In base R using grep
: 在使用grep
基础R中:
sapply(levels(treatment), function(a) rowMeans(df[,grep(a, names(df))]))
# A B
#[1,] 2 5
#[2,] 2 5
#[3,] 2 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.