简体   繁体   中英

exctract correlated elements of a correlation matrix

I have a correlation matrix in R and I want to know how many groups (and put these groups into vectors) of elements correlate between them in more than 95%.

X <- matrix(0,3,5) 
X[,1] <- c(1,2,3)
X[,2] <- c(1,2.2,3)*2
X[,3] <- c(1,2,3.3)*3
X[,4] <- c(6,5,1)
X[,5] <- c(6.1,5,1.2)*4

cor.matrix <- cor(X)
cor.matrix <- cor.matrix*lower.tri(cor.matrix)
cor.vector <- which(cor.matrix>0.95, arr.ind=TRUE)

cor.vector then contains:

     row col 
[1,]   2   1 
[2,]   3   1 
[3,]   3   2 
[4,]   5   4 

That means, as expected, that the vectors 1,2 and 3 correlate between them, and also 4 and 5.

What I would need is to get two vectors c(1,2,3) and c(4,5) as the final result.

This is a simple example, I am processing large matrices though.

Here's an approach using igraph package:

require(igraph)
g <- graph.data.frame(cor.vector, directed = FALSE)
split(unique(as.vector(cor.vector)), clusters(g)$membership)
# $`1`
# [1] 2 3 1

# $`2`
# [1] 5 4

What this essentially does is to find the clusters in the graph g (disconnected sets), as illustrated in the figure below. Since the vertices are used to create the graph in the order you entered (from your cor.vector ), the clustering order also comes back in the same order. That is: for vertices c(2,3,5,1,4) the clusters are c(1,1,2,1,2) with a total of two clusters (cluster 1 and cluster 2). So, we just use this to split using the cluster group.

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM