为什么我的 PCA 和来自 sklearn 的 PCA 得到不同的结果？

Question

I tried to use the PCA provided in "machine learning in action", but I found that the results obtained by it are not the same as those obtained by the PCA in sklearn.我尝试使用“machine learning in action”中提供的PCA，但我发现它得到的结果与sklearn中PCA得到的结果不一样。 I don't quite understand what is going on.我不太明白这是怎么回事。

Below is my code:下面是我的代码：

import numpy as np
from sklearn.decomposition import PCA

x = np.array([
    [1,2,3,4,5, 0],
    [0.6,0.7,0.8,0.9,0.10, 0],
    [110,120,130,140,150, 0]
])

def my_pca(data, dim):
    remove_mean = data - data.mean(axis=0)
    cov_data = np.cov(remove_mean, rowvar=0)
    eig_val, eig_vec = np.linalg.eig(np.mat(cov_data))
    sorted_eig_val = np.argsort(eig_val)
    eig_index = sorted_eig_val[:-(dim+1):-1]
    transfer = eig_vec[:,eig_index]
    low_dim = remove_mean * transfer
    return np.array(low_dim, dtype=float)

pca = PCA(n_components = 3)
pca.fit(x)
new_x = pca.transform(x)
print("sklearn")
print(new_x)

new_x = my_pca(x, 3)
print("my")
print(new_x)

Output: Output：

sklearn
[[-9.32494230e+01  1.46120285e+00  2.37676120e-15]
 [-9.89004904e+01 -1.43283197e+00  2.98143675e-14]
 [ 1.92149913e+02 -2.83708789e-02  2.81307176e-15]]

my
[[ 9.32494230e+01 -1.46120285e+00  7.39333927e-14]
 [ 9.89004904e+01  1.43283197e+00 -7.01760428e-14]
 [-1.92149913e+02  2.83708789e-02  1.84375626e-14]]

Answer 1

The issue relates to your function, in particular the part where you calculate your eigenvector and eigenvalues:该问题与您的 function 有关，特别是您计算特征向量和特征值的部分：

eig_val, eig_vec = np.linalg.eig(np.mat(cov_data))

It appears that ScitKit learn uses "eigh" instead of "eig", so if you change the code snippet from np.linalg.eig to np.linalg.eigh, you should get the same results. ScitKit learn 似乎使用“eigh”而不是“eig”，因此如果将代码片段从 np.linalg.eig 更改为 np.linalg.eigh，您应该会得到相同的结果。

为什么我的 PCA 和来自 sklearn 的 PCA 得到不同的结果？

问题描述

1 个解决方案

解决方案1
0 2022-02-26 23:18:13

为什么我的 PCA 和来自 sklearn 的 PCA 得到不同的结果？

问题描述

1 个解决方案

解决方案1 0 2022-02-26 23:18:13

解决方案1
0 2022-02-26 23:18:13