[英]Row-wise average grouping by columns in R
我对R很陌生,所以我的问题可能很基本。 我有以下数据框
FM_1 SBM_1 FM_2 BP_1 BP_2 SBM_2
K00121 -0.1839897960 -0.8656314 -0.8411707 -0.69968109 -0.8031558 -0.70689896
K08660 -0.5250720652 -0.1513665 -0.2865290 -0.01167864 -0.4330590 -0.52919490
K07408 -0.3784026846 -0.1521273 0.1021097 -0.40613804 -0.4201983 -0.27915511
K13524 -0.4049012076 -0.8533916 -0.4431474 -0.15884372 -0.5256129 -0.54496893
K00600 -0.0009098706 0.2313674 -0.1080085 -0.07682120 -0.1740538 0.09553883
K00286 -0.2710184537 -0.2543416 0.1453829 -0.11907861 0.3392705 -0.19903857
我想用相同的行创建一个新的数据框,但将具有相同前缀(即“ FM”,“ SBM”,“ BP”)的列的平均值作为列。 我正在尝试函数aggregate(),但是“ by”参数遇到麻烦。 我做错了 有人可以给我提示吗? 非常感谢。
这是一种选择
> prefix <- unique(unlist(strsplit(names(df), "\\_[0-9]")))
> sapply(prefix, function(i) rowMeans(df[, grepl(i, names(df))]))
FM SBM BP
K00121 -0.51258025 -0.7862652 -0.7514184
K08660 -0.40580053 -0.3402807 -0.2223688
K07408 -0.13814649 -0.2156412 -0.4131682
K13524 -0.42402430 -0.6991803 -0.3422283
K00600 -0.05445919 0.1634531 -0.1254375
K00286 -0.06281778 -0.2266901 0.1100959
您可以使用“ reshape2”中的melt
和dcast
。 假设您的data.frame
称为“ mydf”,请尝试以下操作:
library(reshape2)
## melt your data. Since you have rownames, wrap in `as.matrix`
## to get the rownames as a variable in the long data.frame
dfL <- melt(as.matrix(mydf))
## Your "Var2" column should be split to give us access to the "variable"
## and "time" values. (Only the "variable" part is required here.)
dfL <- cbind(dfL, colsplit(as.character(dfL$Var2), "_", c("var", "time")))
## The new data now look like this:
head(dfL)
# Var1 Var2 value var time
# 1 K00121 FM_1 -0.1839897960 FM 1
# 2 K08660 FM_1 -0.5250720652 FM 1
# 3 K07408 FM_1 -0.3784026846 FM 1
# 4 K13524 FM_1 -0.4049012076 FM 1
# 5 K00600 FM_1 -0.0009098706 FM 1
# 6 K00286 FM_1 -0.2710184537 FM 1
## From here, it's easy to aggregate with `dcast`
dcast(dfL, Var1 ~ var, value.var="value", fun.aggregate=mean)
# Var1 BP FM SBM
# 1 K00121 -0.7514184 -0.51258025 -0.7862652
# 2 K00286 0.1100959 -0.06281778 -0.2266901
# 3 K00600 -0.1254375 -0.05445919 0.1634531
# 4 K07408 -0.4131682 -0.13814649 -0.2156412
# 5 K08660 -0.2223688 -0.40580053 -0.3402807
# 6 K13524 -0.3422283 -0.42402430 -0.6991803
从“长”格式开始,您还可以使用aggregate
(尝试使用aggregate
( aggregate(value ~ Var1 + var, dfL, mean)
),但是结果本身将是长格式。
df1是您的数据框
vars<-c("FM","SBM","BP")
sapply(vars,function(x)apply(df1[,grep(x,names(df1))],1,mean))
FM SBM BP
K00121 -0.51258025 -0.7862652 -0.7514184
K08660 -0.40580053 -0.3402807 -0.2223688
K07408 -0.13814649 -0.2156412 -0.4131682
K13524 -0.42402430 -0.6991803 -0.3422283
K00600 -0.05445919 0.1634531 -0.1254375
K00286 -0.06281778 -0.2266901 0.1100959
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.