简体   繁体   English

Python:使用 SVD 实现 PCA

[英]Python: Implement a PCA using SVD

I am trying to figure out the differences between PCA using Singular Value Decomposition as oppossed to PCA using Eigenvector-Decomposition.我试图找出使用奇异值分解的 PCA 与使用特征向量分解的 PCA 之间的差异。

Picture the following matrix:描绘以下矩阵:

 B = np.array([          [1, 2],
                         [3, 4],
                         [5, 6] ])

When computing the PCA of this matrix B using eigenvector-Decomposition, we follow these steps:当使用特征向量分解计算这个矩阵 B 的 PCA 时,我们遵循以下步骤:

  1. Center the data (entries of B) by substracting the column-mean from each column通过从每列中减去列均值来使数据(B 的条目)居中
  2. Compute the covariance matrix C = Cov(B) = B^T * B / (m -1) , where m = # rows of B计算协方差矩阵C = Cov(B) = B^T * B / (m -1) ,其中 m = # 行 B
  3. Find eigenvectors of C找到 C 的特征向量
  4. PCs = X * eigen_vecs

When computing the PCA of matrix B using SVD, we follow these steps:当使用 SVD 计算矩阵 B 的 PCA 时,我们遵循以下步骤:

  1. Compute SVD of B: B = U * Sigma * VT计算 B 的 SVD: B = U * Sigma * VT
  2. PCs = U * Sigma

I have done both for the given matrix.我已经为给定的矩阵做了两个。

With eigenvector-Decomposition I obtain this result:通过特征向量分解,我得到了这个结果:

[[-2.82842712  0.        ]
 [ 0.          0.        ]
 [ 2.82842712  0.        ]]

With SVD I obtain this result:使用 SVD 我得到了这个结果:

[[-2.18941839  0.45436451]
 [-4.99846626  0.12383458]
 [-7.80751414 -0.20669536]]

The result obtained with eigenvector-Decomposition is the result given as solution.用特征向量分解得到的结果是作为解给出的结果。 So, why is the result obtained with the SVD different?那么,为什么用SVD得到的结果不同呢?

I know that: C = Cov(B) = V * (Sigma^2)/(m-1)) * VT and I have a feeling this might be related to why the two results are different.我知道: C = Cov(B) = V * (Sigma^2)/(m-1)) * VT我觉得这可能与为什么两个结果不同有关。 Still.仍然。 Can anyone help me understand better?谁能帮助我更好地理解?

Please see below a comparision for your matrix with sklearn.decomposition.PCA and numpy.linalg.svd.请参阅下面的矩阵与 sklearn.decomposition.PCA 和 numpy.linalg.svd 的比较。 Can you compare or post how you derived SVD results.您能否比较或发布您如何得出 SVD 结果。

Code for sklearn.decomposition.PCA: sklearn.decomposition.PCA 的代码:

from sklearn.decomposition import PCA
import numpy as np 
np.set_printoptions(precision=3)

B = np.array([[1.0,2], [3,4], [5,6]])

B1 = B.copy() 
B1 -= np.mean(B1, axis=0) 
n_samples = B1.shape[0]
print("B1 is B after centering:")
print(B1)

cov_mat = np.cov(B1.T)
pca = PCA(n_components=2) 
X = pca.fit_transform(B1)
print("X")
print(X)

eigenvecmat =   []
print("Eigenvectors:")
for eigenvector in pca.components_:
   if eigenvecmat == []:
        eigenvecmat = eigenvector
   else:
        eigenvecmat = np.vstack((eigenvecmat, eigenvector))
   print(eigenvector)
print("eigenvector-matrix")
print(eigenvecmat)

print("CHECK FOR PCA:")
print("X * eigenvector-matrix (=B1)")
print(np.dot(PCs, eigenvecmat))

Output for PCA: PCA 的输出:

B1 is B after centering:
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]
X
[[-2.828  0.   ]
 [ 0.     0.   ]
 [ 2.828  0.   ]]
Eigenvectors:
[0.707 0.707]
[-0.707  0.707]
eigenvector-matrix
[[ 0.707  0.707]
 [-0.707  0.707]]
CHECK FOR PCA:
X * eigenvector-matrix (=B1)
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]

numpy.linalg.svd: numpy.linalg.svd:

print("B1 is B after centering:")
print(B1)

from numpy.linalg import svd 
U, S, Vt = svd(X1, full_matrices=True)

print("U:")
print(U)
print("S used for building Sigma:")
print(S)
Sigma = np.zeros((3, 2), dtype=float)
Sigma[:2, :2] = np.diag(S)
print("Sigma:")
print(Sigma)
print("V already transposed:")
print(Vt)
print("CHECK FOR SVD:")
print("U * Sigma * Vt (=B1)")
print(np.dot(U, np.dot(Sigma, Vt)))

Output for SVD: SVD 的输出:

B1 is B after centering:
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]
U:
[[-0.707  0.     0.707]
 [ 0.     1.     0.   ]
 [ 0.707  0.     0.707]]
S used for building Sigma:
[4. 0.]
Sigma:
[[4. 0.]
 [0. 0.]
 [0. 0.]]
V already transposed:
[[ 0.707  0.707]
 [-0.707  0.707]]
CHECK FOR SVD:
U * Sigma * Vt (=B1)
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM