Python PCA 图使用 Hotelling 的 T2 作为置信区间

Question

I am trying to apply PCA for Multi variant Analysis and plot the score plot for first two components with Hotelling T2 confidence ellipse in python.我正在尝试将 PCA 应用于多变量分析，并在 python 中使用 Hotelling T2 置信椭圆绘制前两个组件的得分图。 I was able to get the scatter plot and I want to add 95% confidence ellipse to the scatter plot.我能够得到散点图，我想向散点图添加 95% 置信椭圆。 It would be great if anyone know how it can be done in python.如果有人知道如何在 python 中完成它会很棒。

Sample picture of expected output:预期输出的示例图片：

Answer 1

This was bugging me, so I adopted an answer from PCA and Hotelling's T^2 for confidence intervall in R in python (and using some source code from the ggbiplot R package)这让我很烦恼，所以我采用了PCA 和 Hotelling 的 T^2的答案，用于Python 中 R中的置信区间（并使用了 ggbiplot R 包中的一些源代码）

from sklearn import decomposition
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt
import scipy, random

#Generate data and fit PCA
random.seed(1)
data = np.array(np.random.normal(0, 1, 500)).reshape(100, 5)
outliers = np.array(np.random.uniform(5, 10, 25)).reshape(5, 5)
data = np.vstack((data, outliers))
pca = decomposition.PCA(n_components = 2)
scaler = StandardScaler()
scaler.fit(data)
data = scaler.transform(data)
pcaFit = pca.fit(data)
dataProject = pcaFit.transform(data)

#Calculate ellipse bounds and plot with scores
theta = np.concatenate((np.linspace(-np.pi, np.pi, 50), np.linspace(np.pi, -np.pi, 50)))
circle = np.array((np.cos(theta), np.sin(theta)))
sigma = np.cov(np.array((dataProject[:, 0], dataProject[:, 1])))
ed = np.sqrt(scipy.stats.chi2.ppf(0.95, 2))
ell = np.transpose(circle).dot(np.linalg.cholesky(sigma) * ed)
a, b = np.max(ell[: ,0]), np.max(ell[: ,1]) #95% ellipse bounds
t = np.linspace(0, 2 * np.pi, 100)

plt.scatter(dataProject[:, 0], dataProject[:, 1])
plt.plot(a * np.cos(t), b * np.sin(t), color = 'red')
plt.grid(color = 'lightgray', linestyle = '--')
plt.show()

Plot情节

Answer 2

The pca library provides Hotelling T2 and SPE/DmodX outlier detection. pca 库提供 Hotelling T2 和 SPE/DmodX 异常值检测。

pip install pca

from pca import pca
import pandas as pd
import numpy as np

# Create dataset with 100 samples
X = np.array(np.random.normal(0, 1, 500)).reshape(100, 5)
# Create 5 outliers
outliers = np.array(np.random.uniform(5, 10, 25)).reshape(5, 5)
# Combine data
X = np.vstack((X, outliers))

# Initialize model. Alpha is the threshold for the hotellings T2 test to determine outliers in the data.
model = pca(alpha=0.05)

# Fit transform
out = model.fit_transform(X)

Print the outliers with打印异常值

print(out['outliers'])

#            y_proba      y_score  y_bool  y_bool_spe  y_score_spe
# 1.0   9.799576e-01     3.060765   False       False     0.993407
# 1.0   8.198524e-01     5.945125   False       False     2.331705
# 1.0   9.793117e-01     3.086609   False       False     0.128518
# 1.0   9.743937e-01     3.268052   False       False     0.794845
# 1.0   8.333778e-01     5.780220   False       False     1.523642
# ..             ...          ...     ...         ...          ...
# 1.0   6.793085e-11    69.039523    True        True    14.672828
# 1.0  2.610920e-291  1384.158189    True        True    16.566568
# 1.0   6.866703e-11    69.015237    True        True    14.936442
# 1.0  1.765139e-292  1389.577522    True        True    17.183093
# 1.0  1.351102e-291  1385.483398    True        True    17.319038

Make the plot制作情节

model.biplot(legend=True, SPE=True, hotellingt2=True)

Python PCA 图使用 Hotelling 的 T2 作为置信区间

问题描述

2 个解决方案

解决方案1
3 2018-09-14 21:50:46

解决方案2
2 2020-07-22 22:02:44

Python PCA 图使用 Hotelling 的 T2 作为置信区间

问题描述

2 个解决方案

解决方案1 3 2018-09-14 21:50:46

解决方案2 2 2020-07-22 22:02:44

解决方案1
3 2018-09-14 21:50:46

解决方案2
2 2020-07-22 22:02:44