简体   繁体   English

为什么我的 PCA 和来自 sklearn 的 PCA 得到不同的结果?

[英]why my PCA and PCA from sklearn get different results?

I tried to use the PCA provided in "machine learning in action", but I found that the results obtained by it are not the same as those obtained by the PCA in sklearn.我尝试使用“machine learning in action”中提供的PCA,但我发现它得到的结果与sklearn中PCA得到的结果不一样。 I don't quite understand what is going on.我不太明白这是怎么回事。

Below is my code:下面是我的代码:

import numpy as np
from sklearn.decomposition import PCA

x = np.array([
    [1,2,3,4,5, 0],
    [0.6,0.7,0.8,0.9,0.10, 0],
    [110,120,130,140,150, 0]
])

def my_pca(data, dim):
    remove_mean = data - data.mean(axis=0)
    cov_data = np.cov(remove_mean, rowvar=0)
    eig_val, eig_vec = np.linalg.eig(np.mat(cov_data))
    sorted_eig_val = np.argsort(eig_val)
    eig_index = sorted_eig_val[:-(dim+1):-1]
    transfer = eig_vec[:,eig_index]
    low_dim = remove_mean * transfer
    return np.array(low_dim, dtype=float)

pca = PCA(n_components = 3)
pca.fit(x)
new_x = pca.transform(x)
print("sklearn")
print(new_x)

new_x = my_pca(x, 3)
print("my")
print(new_x)

Output: Output:

sklearn
[[-9.32494230e+01  1.46120285e+00  2.37676120e-15]
 [-9.89004904e+01 -1.43283197e+00  2.98143675e-14]
 [ 1.92149913e+02 -2.83708789e-02  2.81307176e-15]]

my
[[ 9.32494230e+01 -1.46120285e+00  7.39333927e-14]
 [ 9.89004904e+01  1.43283197e+00 -7.01760428e-14]
 [-1.92149913e+02  2.83708789e-02  1.84375626e-14]]

The issue relates to your function, in particular the part where you calculate your eigenvector and eigenvalues:该问题与您的 function 有关,特别是您计算特征向量和特征值的部分:

eig_val, eig_vec = np.linalg.eig(np.mat(cov_data))

It appears that ScitKit learn uses "eigh" instead of "eig", so if you change the code snippet from np.linalg.eig to np.linalg.eigh, you should get the same results. ScitKit learn 似乎使用“eigh”而不是“eig”,因此如果将代码片段从 np.linalg.eig 更改为 np.linalg.eigh,您应该会得到相同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM