[英]mean for matrices within a matrix in base R or dplyr
Consider the following matrix: 考虑以下矩阵:
tt <- structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785,
NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722,
235.306183514161, NA, NA, NA, NA, NA, NA, NA, 234.959570990415,
209.098063118719, 218.561204242656, 222.512920973143, NA, NA,
NA, NA, NA, NA, 208.300264042079, 215.937490955137, 237.957979483774,
192.688868386319, 235.076583265965, NA, NA, NA, NA, NA, 206.523606398881,
223.937491278258, 223.926327170344, 214.32218737219, 226.512692801088,
201.218786399282, NA, NA, NA, NA, 224.281073655358, 213.943917885038,
238.593797069413, 203.435493461687, 229.752040252094, 219.155196151038,
218.091723822799, NA, NA, NA, 220.671701855947, 201.380237362061,
232.187424293393, 191.10206696946, 234.448288541418, 178.759615126012,
214.037379912949, 204.514058196497, NA, NA, 232.924880594581,
229.573517636508, 197.886331008486, 231.900840878165, 221.634834807167,
227.927620090238, 232.886238322491, 239.428486191598, 231.987068605127,
NA), .Dim = c(10L, 10L), .Dimnames = list(c("SA1", "SA1", "SA1",
"SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2"), c("SA1", "SA1",
"SA1", "SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2")))
It looks like that: 看起来像这样:
SA1 SA1 SA1 SA1 SA2 SA2 SA2 SA2 SA2 SA2
SA1 NA 223.2622 233.3174 213.9766 234.9596 208.3003 206.5236 224.2811 220.6717 232.9249
SA1 NA NA 228.2301 202.4204 209.0981 215.9375 223.9375 213.9439 201.3802 229.5735
SA1 NA NA NA 235.3062 218.5612 237.9580 223.9263 238.5938 232.1874 197.8863
SA1 NA NA NA NA 222.5129 192.6889 214.3222 203.4355 191.1021 231.9008
SA2 NA NA NA NA NA 235.0766 226.5127 229.7520 234.4483 221.6348
SA2 NA NA NA NA NA NA 201.2188 219.1552 178.7596 227.9276
SA2 NA NA NA NA NA NA NA 218.0917 214.0374 232.8862
SA2 NA NA NA NA NA NA NA NA 204.5141 239.4285
SA2 NA NA NA NA NA NA NA NA NA 231.9871
SA2 NA NA NA NA NA NA NA NA NA NA
I would like to calculate the mean for SA1 and SA2 sub matrices. 我想计算SA1和SA2子矩阵的平均值。 By sub_matrices I mean only SA1 equal rownames and columnames and also only SA2 equal rownames and column names.
用sub_matrices表示仅SA1等于行名和列名,也仅表示SA2等于行名和列名。 For SA1 this would be like
mean(tt[1:4,1:4],na.rm=T)
, however my real matrix is much bigger than this example so basic sub setting is not a solution but rather some sort of grouping by distinct row.names
and colnames
. 对于SA1来说,这类似于
mean(tt[1:4,1:4],na.rm=T)
,但是我的实矩阵比这个例子大得多,因此基本子设置不是解决方案,而是某种分组通过不同的row.names
和colnames
。 If someone could show me a solution in both base R and dplyr would be awesome. 如果有人可以向我展示R和dplyr的解决方案,那就太好了。
We could loop over all the unique
column names of the matrix using sapply
, subset them and take mean
of each sub-matrix. 我们可以使用
sapply
矩阵的所有unique
列名称,对其进行子集化并取每个子矩阵的mean
。
sapply(unique(colnames(tt)), function(x)
mean(tt[rownames(tt) == x, colnames(tt) == x], na.rm = TRUE))
# SA1 SA2
#222.8 221.0
This makes a vector called sub_list
which starts out as a vector of the unique column names, then iterating through the subsets, the names are replaced by the means (you could output them to another vector but why make two when one will suffice?) 这样就产生了一个名为
sub_list
的向量,该向量以唯一列名的向量开始,然后遍历子集,名称被均值替换(您可以将它们输出到另一个向量,但是为什么要在一个向量足够的情况下再加上两个?)
sub_list <- unique(colnames(tt))
for(j in 1:length(sub_list)){
sub_list[j] <- mean(tt[,colnames(tt) == sub_list[j]], na.rm = TRUE)
}
An option with tidyverse
. tidyverse
的选项。 We can melt
the 'tt' into 'long' format. 我们可以
melt
的“TT”到“长”格式。 Filter the rows where the row names and column names are same, then grouped by 'Var1', get the mean
of 'value' column 筛选行名称和列名称相同的行,然后按“ Var1”分组,获取“值”列的
mean
library(dplyr)
library(reshape2)
melt(tt) %>%
filter(Var1 == Var2) %>%
group_by(Var1) %>%
summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
# Var1 value
# <fct> <dbl>
#1 SA1 223.
#2 SA2 221.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.