简体   繁体   中英

PCA and Hotelling's T^2 for confidence intervall in R

I made a principal component analysis and took the 2 first principal components. I made a chart of my points based on the score of the 2 PC. I would like to add on this graph a 95% confidence region corresponding to the Hotelling's T^2 test in order to detect the points that are out of the ellipse (outliers) How is it possible in R? Do you have any example?

I would do something like this and detect the points out of the ellipse:

在此输入图像描述

We can plot the confidence ellipse for PCA with vegan or ggbiplot as below:

set.seed(1)
data <- matrix(rnorm(500), ncol=5) # some random data
data <- setNames(as.data.frame(rbind(data, matrix(runif(25, 5, 10), ncol=5))), LETTERS[1:5]) # add some outliers
class <- sample(c(0,3,6,8), 105, replace=TRUE) # 4 groups

library(vegan)
PC <- rda(data, scale=TRUE)
pca_scores <- scores(PC, choices=c(1,2))
plot(pca_scores$sites[,1], pca_scores$sites[,2],
     pch=class, col=class, xlim=c(-2,2), ylim=c(-2,2))
arrows(0,0,pca_scores$species[,1],pca_scores$species[,2],lwd=1,length=0.2)
ordiellipse(PC,class,conf=0.95)

在此输入图像描述

library(ggbiplot)
PC <- prcomp(data, scale = TRUE)
ggbiplot(PC, obs.scale = 1, var.scale = 1, groups = as.factor(class), ellipse = TRUE, 
                                                    ellipse.prob = 0.95)

在此输入图像描述

The pcaMethods package has a function simpleEllipse(x, y, alpha, len) that will do this. Given two uncorrelated data vectors it will return an ellipse, where the axes are scaled based on the variance of each score, and the F statistic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM