sklearn與numpy中的PCA不同

Question

我誤會了嗎。 這是我的代碼

使用sklearn

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

pca = decomposition.PCA(n_components=3)

x = np.array([
        [0.387,4878, 5.42],
        [0.723,12104,5.25],
        [1,12756,5.52],
        [1.524,6787,3.94],
    ])
pca.fit_transform(x)

輸出：

array([[ -4.25324997e+03,  -8.41288672e-01,  -8.37858943e-03],
   [  2.97275001e+03,  -1.25977271e-01,   1.82476780e-01],
   [  3.62475003e+03,  -1.56843494e-01,  -1.65224286e-01],
   [ -2.34425007e+03,   1.12410944e+00,  -8.87390454e-03]])

使用numpy方法

x_std = StandardScaler().fit_transform(x)
cov = np.cov(X.T)
ev , eig = np.linalg.eig(cov)
a = eig.dot(x_std.T)

輸出量

array([[ 1.38252552, -1.25240764,  0.2133338 ],
       [-0.53279935, -0.44541231, -0.77988021],
       [-0.45230635,  0.21983192, -1.23796328],
       [-0.39741982,  1.47798804,  1.80450969]])

我保留了所有3個組件，但似乎不允許我保留原始數據。

我可以知道為什么會這樣嗎？

Answer 1

不要使用StandardScaler 。 相反，只需從x減去每列的平均值即可：

In [92]: xm = x - x.mean(axis=0)

In [93]: cov = np.cov(xm.T)

In [94]: evals, evecs = np.linalg.eig(cov)

In [95]: xm.dot(evecs)
Out[95]: 
array([[ -4.2532e+03,  -8.3786e-03,  -8.4129e-01],
       [  2.9728e+03,   1.8248e-01,  -1.2598e-01],
       [  3.6248e+03,  -1.6522e-01,  -1.5684e-01],
       [ -2.3443e+03,  -8.8739e-03,   1.1241e+00]])

最后一個結果包含與sklearn結果相同的信息，但是列的順序不同。

sklearn與numpy中的PCA不同

問題描述

1 個解決方案

解決方案1
2 2016-09-20 05:05:38

sklearn與numpy中的PCA不同

問題描述

1 個解決方案

解決方案1 2 2016-09-20 05:05:38

解決方案1
2 2016-09-20 05:05:38