简体   繁体   English

以R或dplyr为基的矩阵内的矩阵的平均值

[英]mean for matrices within a matrix in base R or dplyr

Consider the following matrix: 考虑以下矩阵:

  tt <-  structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785, 
NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722, 
235.306183514161, NA, NA, NA, NA, NA, NA, NA, 234.959570990415, 
209.098063118719, 218.561204242656, 222.512920973143, NA, NA, 
NA, NA, NA, NA, 208.300264042079, 215.937490955137, 237.957979483774, 
192.688868386319, 235.076583265965, NA, NA, NA, NA, NA, 206.523606398881, 
223.937491278258, 223.926327170344, 214.32218737219, 226.512692801088, 
201.218786399282, NA, NA, NA, NA, 224.281073655358, 213.943917885038, 
238.593797069413, 203.435493461687, 229.752040252094, 219.155196151038, 
218.091723822799, NA, NA, NA, 220.671701855947, 201.380237362061, 
232.187424293393, 191.10206696946, 234.448288541418, 178.759615126012, 
214.037379912949, 204.514058196497, NA, NA, 232.924880594581, 
229.573517636508, 197.886331008486, 231.900840878165, 221.634834807167, 
227.927620090238, 232.886238322491, 239.428486191598, 231.987068605127, 
NA), .Dim = c(10L, 10L), .Dimnames = list(c("SA1", "SA1", "SA1", 
"SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2"), c("SA1", "SA1", 
"SA1", "SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2")))

It looks like that: 看起来像这样:

   SA1      SA1      SA1      SA1      SA2      SA2      SA2      SA2      SA2      SA2
SA1  NA 223.2622 233.3174 213.9766 234.9596 208.3003 206.5236 224.2811 220.6717 232.9249
SA1  NA       NA 228.2301 202.4204 209.0981 215.9375 223.9375 213.9439 201.3802 229.5735
SA1  NA       NA       NA 235.3062 218.5612 237.9580 223.9263 238.5938 232.1874 197.8863
SA1  NA       NA       NA       NA 222.5129 192.6889 214.3222 203.4355 191.1021 231.9008
SA2  NA       NA       NA       NA       NA 235.0766 226.5127 229.7520 234.4483 221.6348
SA2  NA       NA       NA       NA       NA       NA 201.2188 219.1552 178.7596 227.9276
SA2  NA       NA       NA       NA       NA       NA       NA 218.0917 214.0374 232.8862
SA2  NA       NA       NA       NA       NA       NA       NA       NA 204.5141 239.4285
SA2  NA       NA       NA       NA       NA       NA       NA       NA       NA 231.9871
SA2  NA       NA       NA       NA       NA       NA       NA       NA       NA       NA

I would like to calculate the mean for SA1 and SA2 sub matrices. 我想计算SA1和SA2子矩阵的平均值。 By sub_matrices I mean only SA1 equal rownames and columnames and also only SA2 equal rownames and column names. 用sub_matrices表示仅SA1等于行名和列名,也仅表示SA2等于行名和列名。 For SA1 this would be like mean(tt[1:4,1:4],na.rm=T) , however my real matrix is much bigger than this example so basic sub setting is not a solution but rather some sort of grouping by distinct row.names and colnames . 对于SA1来说,这类似于mean(tt[1:4,1:4],na.rm=T) ,但是我的实矩阵比这个例子大得多,因此基本子设置不是解决方案,而是某种分组通过不同的row.namescolnames If someone could show me a solution in both base R and dplyr would be awesome. 如果有人可以向我展示R和dplyr的解决方案,那就太好了。

We could loop over all the unique column names of the matrix using sapply , subset them and take mean of each sub-matrix. 我们可以使用sapply矩阵的所有unique列名称,对其进行子集化并取每个子矩阵的mean

sapply(unique(colnames(tt)), function(x) 
     mean(tt[rownames(tt) == x, colnames(tt) == x], na.rm = TRUE))

#  SA1   SA2 
#222.8 221.0 

This makes a vector called sub_list which starts out as a vector of the unique column names, then iterating through the subsets, the names are replaced by the means (you could output them to another vector but why make two when one will suffice?) 这样就产生了一个名为sub_list的向量,该向量以唯一列名的向量开始,然后遍历子集,名称被均值替换(您可以将它们输出到另一个向量,但是为什么要在一个向量足够的情况下再加上两个?)

sub_list <- unique(colnames(tt))

for(j in 1:length(sub_list)){
  sub_list[j] <- mean(tt[,colnames(tt) == sub_list[j]], na.rm =  TRUE)
}

An option with tidyverse . tidyverse的选项。 We can melt the 'tt' into 'long' format. 我们可以melt的“TT”到“长”格式。 Filter the rows where the row names and column names are same, then grouped by 'Var1', get the mean of 'value' column 筛选行名称和列名称相同的行,然后按“ Var1”分组,获取“值”列的mean

library(dplyr)
library(reshape2)
melt(tt) %>% 
   filter(Var1 == Var2) %>%
   group_by(Var1) %>%
   summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
#  Var1  value
#  <fct> <dbl>
#1 SA1    223.
#2 SA2    221.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM