简体   繁体   English

绘制巨大的相关矩阵作为颜色

[英]Plot gigantic correlation matrix as colours

I have a correlation matrix $P_{i,j}$ which is $1000 \\times 1000$. 我有一个相关矩阵$ P_ {i,j} $,它是$ 1000 \\乘以1000 $。 Given the data the matrix will have rectangular patches of very high correlations. 给定数据,矩阵将具有高度相关的矩形块。 That is, if you draw a $20 \\times 20$ square anywhere in this matrix you will either be looking at a patch of highly correlated variables ($\\rho_{i,j}> 0.8$) or medium to uncorrelated ($\\in [-0.1, 0.5]$). 也就是说,如果您在此矩阵中的任意位置绘制$ 20 \\ times 20 $平方,则您将看到高度相关变量的补丁($ \\ rho_ {i,j}> 0.8 $)或中到不相关的变量($ \\ in [-0.1,0.5] $)。 The reason for this is the structure of the data. 这样做的原因是数据的结构。

How do I represent this graphically? 如何用图形表示? I know of one way to visualize a matrix like this but it only works for small dimensions: 我知道一种可视化这样的矩阵的方法,但它仅适用于小尺寸:

install.packages("plotrix")
library(plotrix)
rhoMat = array(rnorm(1000*1000),dim=c(1000,1000))
color2D.matplot(rhoMat[1:10,1:10],cs1=c(0,0.01),cs2=c(0,0),cs3=c(0,0)) #nice!
color2D.matplot(rhoMat,cs1=c(0,0.01),cs2=c(0,0),cs3=c(0,0)) #broken!

What is a function or algorithm that would plot a red area if in that vicinity in the matrix $P_{i,j}$, correlations "tend to" be high, versus "tending" to be low (even better if it switches from one colour to another as we move from positive to negative correlation patches). 如果在矩阵$ P_ {i,j} $的那个附近,相关性“趋于”高,而“趋向于”低,那么将绘制红色区域的函数或算法是什么(如果从从正相关色块向负相关色块移动时,一种颜色会变为另一种颜色)。 I want something to see how many patches of high correlations there are and whether one patch is correlated to another patch at a different place in the dataset. 我想看一下有多少个高度相关的补丁,以及一个补丁是否与数据集中另一个位置的另一个补丁相关。

I only want to do it in R . 我只想在R这样做。

I think you can use image with the argument breaks to get exactly what you want: 我认为您可以使用带参数breaks image来获取所需的内容:

dat <- matrix(runif(10000), ncol = 100)
image(dat, breaks = c(0.0, 0.8, 1.0), col = c("yellow", "red"))

I always fail to think of image for this kind of problem - the name is sort of non-obvious. 对于这种问题,我总是想不到image -名称有点不明显。 I started with heatmap and then it led me to image . 我从heatmap开始,然后它使我image

Look at the corrplot package. 查看corrplot程序包。 It has various tools for visualizing correlations, one option that it has is to use hierarchical clustering to draw rectangles around groups of high or low correlation. 它具有用于可视化相关性的各种工具,它的一个选择是使用层次聚类在高相关性或低相关性的组周围绘制矩形。

I've done this in Excel fairly easily. 我已经很容易在Excel中完成此操作。 You can change the colour of boxes based on range of values within the boxes. 您可以根据框内值的范围更改框的颜色。 You can even create a gradient from lets say 0 to 1. 1000 x 1000 would be big for Excel, but I think it would work. 您甚至可以创建一个从0到1的渐变。对于Excel,1000 x 1000会很大,但是我认为它会起作用。 You would just have to zoom out. 您只需要缩小即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM