简体   繁体   English

sci-kit 学习 PCA 和手动 PCA 的结果差异

[英]Difference in result for sci-kit learn PCA and manual PCA

I'm really puzzled, hopefully someone can show me what I am missing.我真的很困惑,希望有人能告诉我我错过了什么。 I'm trying to get principal components via two different methods:我正在尝试通过两种不同的方法获取主要组件:

import numpy as np
data = np.array([[ 2.1250045 , -0.17169867, -0.47799957],
               [ 0.7400025 , -0.07970344, -0.99600106],
               [ 0.15800177,  1.2993019 , -0.8030003 ],
               [ 0.3159989 ,  1.919297  ,  0.24300112],
               [-0.14800562, -1.0827019 , -0.2890004 ],
               [ 0.26900184, -1.3816979 ,  1.1239979 ],
               [-0.5040008 , -2.9066994 ,  1.6400006 ],
               [-1.2230027 , -2.415702  ,  3.1940014 ],
               [-0.54700005,  1.757302  , -1.825999  ],
               [-1.1860001 ,  3.0623024 , -1.8090007 ]]) # this should already be mean centered



# Method 1. Scikit-Learn
from sklearn.decomposition import PCA

pca = PCA(n_components=3).fit(data)
print(pca.components_)
[[-0.04209988 -0.79261507  0.60826717]
 [ 0.88594009 -0.31106375 -0.34401963]
 [ 0.46188501  0.52440508  0.71530521]]


# Method 2. Manually with numpy
cov = np.cov(data.T)

evals , evecs = np.linalg.eig(cov)

# The next three lines are just sorting by the largest eigenvalue
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
evals = evals[idx]

print(evecs.T)
[[ 0.04209988  0.79261507 -0.60826717]
 [ 0.88594009 -0.31106375 -0.34401963]
 [-0.46188501 -0.52440508 -0.71530521]]

The values for the eigenvectors are the same, but the signs are wrong.特征向量的值相同,但符号错误。 What I want is to get the output from sklearn PCA, but using only numpy.我想要的是从 sklearn PCA 获得 output,但只使用 numpy。 Thanks in advance for any suggestions.在此先感谢您的任何建议。

That is expected because the eigenspace of a matrix (covariance matrix in your question) is unique but the specific set of eigenvectors is not.这是预期的,因为矩阵的特征空间(您问题中的协方差矩阵)是唯一的,但特定的特征向量集不是。 It is too much to explain here, so I would recommend the answer in math.se这里解释的太多了,所以我会推荐math.se 中的答案

PS: Notice that you're dealing with covariance matrix of 3x3 and you can imagine the eigenvectors as vectors in 3D with x-, y-, z-axis. PS:请注意,您正在处理 3x3 的协方差矩阵,您可以将特征向量想象为 3D 中具有 x、y、z 轴的向量。 Then you should notice your numpy answer vs sklearn answer are in exact opposite direction for 2 vectors and same direction for 1 vector.然后您应该注意到您的 numpy 答案与 sklearn 答案对于 2 个向量的方向完全相反,对于 1 个向量的方向相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM