[英]How to plot distance biplot and correlation biplot results of SVD/PCA in R?
I searched for a long time for a straightforward explanation of the distance vs correlation biplots, as well as an explanation of how to transform the standard outputs of PCA to achieve the two biplots.我搜索了很长时间,以寻找距离与相关双图的直接解释,以及如何转换 PCA 的标准输出以实现两个双图的解释。 All the stack overflow explanations 1 2 3 4 I saw went way over my head with math terms.
我看到的所有堆栈溢出解释1 2 3 4都用数学术语让我大吃一惊。 How can I create both a distance biplot and a correlation biplot using the outputs of R's prcomp?
如何使用 R 的 prcomp 的输出创建距离双图和相关双图?
The best explanation I found is some lecture slides from Pierre Legendre, Département de sciences biologiques, Université de Montréal ( http://biol09.biol.umontreal.ca/PLcourses/Ordination_section_1.1_PCA_Eng.pdf ).我找到的最好的解释是来自蒙特利尔大学生物科学系 Pierre Legendre 的一些演讲幻灯片( http://biol09.biol.umontreal.ca/PLcourses/Ordination_section_1.1_PCA_Eng.Z437175BA4191210EE004E1D937 However, while these slides did show the way to plot a distance and correlation biplot manually, they didn't show how to plot the distance and correlation biplots from the results of prcomp.
然而,虽然这些幻灯片确实显示了手动 plot 距离和相关双图的方法,但它们没有显示如何从 prcomp 的结果中 plot 距离和相关双图。
So I worked through an example that shows how one can use the outputs of prcomp for them to be equivalent to the example walked through in the pdf above.因此,我完成了一个示例,该示例显示了如何使用 prcomp 的输出使其与上面 pdf 中的示例等效。 I am leaving this here for future people like myself who are wondering how to plot a distance vs correlation biplot and when you want to use each (according to Pierre Legendre)
我将这里留给像我这样想知道如何 plot 距离与相关双标图以及何时要使用它们的未来人(根据 Pierre Legendre)
set.seed(1)
#Run standard PCA
pca_res <- prcomp(mtcars[, 1:7], center = TRUE, scale = TRUE, retx = TRUE)
#To print a distance biplot, simply plot pca_red$x as points and $rotation
#as vectors
library(ggplot2)
arrow_len <- 3 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x), aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation),
aes(x = 0, y = 0, yend = arrow_len*PC1, xend = arrow_len*PC2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation),
mapping = aes(y = arrow_len*PC1, x = arrow_len*PC2,
label = row.names(pca_res$rotation)))
#This is equivalent to the following steps:
Y_centered <- scale(mtcars[, 1:7], center = TRUE, scale = TRUE)
Y_eig <- eigen(cov(Y_centered))
#Note that Y_eig$vectors == pca_res$rotation ("rotations" or "loadings")
# and Y_eig$values (eigenvalues) == pca_res$sdev**2
#For a distance biplot
U_frame <- Y_eig$vectors
#F is your PC scores, achieved by multiplying your original data by the rotations
F_frame <- Y_centered %*% U_frame
#flipping constants if needed bc PC axis direction is arbitrary
x_flip = -1
y_flip = -1
ggplot(data = as.data.frame(F_frame), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
#To print a correlation biplot, matrix multiply your rotations/loadings
# by the identity matrix times your PCA standard deviations
# (equivalent to the sqrt of your eigen values)
U_frame_scaling2 <- U_frame %*% diag(Y_eig$values^(0.5))
#And divide your PC scores by your PCA standard deviations
# (equivalent to 1/sqrt(eigen values)
F_frame_scaling2 <- F_frame %*% diag(Y_eig$values^(-0.5))
#Plot
arrow_len <- 1.5 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x %*% diag(1/pca_res$sdev)),
aes(x = V1, y = V2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
aes(x = 0, y = 0, yend = arrow_len*V1, xend = arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
mapping = aes(y = arrow_len*V1, x = arrow_len*V2,
label = row.names(pca_res$rotation)))
ggplot(data = as.data.frame(F_frame_scaling2), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame_scaling2),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame_scaling2),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
As for the differences between the two (in case the pdf above becomes unavailable at some point):至于两者之间的区别(如果上面的 pdf 在某些时候不可用):
Scaling type 1: distance biplot, used when the interest is on the positions of the objects with respect to one another.缩放类型 1:距离双标图,当感兴趣的是对象相对于彼此的位置时使用。 –
–
Scaling type 2: correlation biplot, used when the angular relationships among the variables are of primary interest.缩放类型 2:相关双图,当主要关注变量之间的 angular 关系时使用。 –
–
In scaling 1 (distance biplot),在缩放 1(距离双标图)中,
In scaling 2 (correlation biplot),在缩放 2(相关双标图)中,
In scaling 1 (distance biplot),在缩放 1(距离双标图)中,
In scaling 2 (correlation biplot),在缩放 2(相关双标图)中,
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.