简体   繁体   中英

How do I interpret the output of corrplot?

The corrplot packages provides some neat plots and documents with examples.

But I don't understand the output. I can see that if you have a matrix A_ij , you can plot it as an arrangement of n by n square tiles, where the color of tile ij corresponds to the value of A_ij . But some examples appear to have more dimensions:

在此输入图像描述

Here we can guess that color shows the correlation coefficient, and orientation of the ellipse is negative/positive correlation. What is the eccentricity?

The documentation for method says:

the visualization method of correlation matrix to be used. Currently, it supports seven methods, named "circle" (default), "square", "ellipse", "number", "pie", "shade" and "color". See examples for details.

The areas of circles or squares show the absolute value of corresponding correlation coefficients. Method "pie" and "shade" came from Michael Friendly's job (with some adjustment about the shade added on), and "ellipse" came from DJ Murdoch and ED Chow's job, see in section References.

So we know that the area, for circles and squares, should show the coefficient. What about the other dimensions, and other methods?

There is only one dimension shown by the plot.

Michael Friendly, in Corrgrams: Exploratory displays for correlation matrices (the corrplot documentation confusingly refers to this as his "job"), says:

In the shaded row, each cell is shaded blue or red depending on the sign of the correlation, and with the intensity of color scaled 0–100% in proportion to the magnitude of the correlation. (Such scaled colors are easily computed using RGB coding from red, (1, 0, 0), through white (1, 1, 1), to blue (0, 0, 1). For simplicity, we ignore the non-linearities of color reproduction and perception, but note that these are easily accommodated in the color mapping function.) White diagonal lines are added so that the direction of the correlation may still be discerned in black and white. This bipolar scale of color was chosen to leave correlations near 0 empty (white), and to make positive and negative values of equal magnitude approximately equally intensely shaded. Gray scale and other color schemes are implemented in our software (Section 6), but not illustrated here.

The bar and circular symbols also use the same scaled colors, but fill an area proportional to the absolute value of the correlation. For the bars, negative values are filled from the bottom, positive values from the top. The circles are filled clockwise for positive values, anti-clockwise for negative values. The ellipses have their eccentricity parametrically scaled to the correlation value (Murdoch and Chow, 1996). Perceptually, they have the property of becoming visually less prominent as the magnitude of the correlation increases, in contrast to the other glyphs.

(emphasis mine)

在此输入图像描述

"Murdoch and Chow, 1996" is a publication describing the equation for drawing the ellipses ( A Graphical Display of Large Correlation Matrices ). The ellipses are apparently meant to be caricatures of bivariate normal distributions:

在此输入图像描述

So in conclusion, the only dimension shown is always the correlation coefficient (or the value of A_ij , to use the question's terminology) itself. The multiple apparent dimensions are redundant.

I think the plot is quite self explanatory. On the right hand side you have the scale which is colored from red (negative correlation) to blue (positive correlation). The color follows a gradient according to the strength of the correlation.

If the ellipse leans towards the right, it is again positive correlation and if it leans to the left, it is negative correlation.

The diffusion around a line (which denotes perfect correlation, for example mpg ~ mpg) creates an ellipse. You will have a more diffused ellipse for lower strengths of the correlation. This is typically how a weakly correlated relationship will look in a scatterplot. These I think are caricatures, however.

Here is some code from the corrplot function responsible for drawing ellipses. I am not going to attempt to explain this (because it is a part of a larger system). I wanted to show that the logic is all there if you'd like to deep dive into it:

if (method == "ellipse" & plotCI == "n") {
    ell.dat <- function(rho, length = 99) {
        k <- seq(0, 2 * pi, length = length)
        x <- cos(k + acos(rho)/2)/2
        y <- cos(k - acos(rho)/2)/2
        return(cbind(rbind(x, y), c(NA, NA)))
    }
    ELL.dat <- lapply(DAT, ell.dat)
    ELL.dat2 <- 0.85 * matrix(unlist(ELL.dat), ncol = 2, 
        byrow = TRUE)
    ELL.dat2 <- ELL.dat2 + Pos[rep(1:length(DAT), each = 100), 
        ]
    polygon(ELL.dat2, border = col.border, col = col.fill)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM