[英]PCA in sklearn vs numpy is different
我誤會了嗎。 這是我的代碼
使用sklearn
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
pca = decomposition.PCA(n_components=3)
x = np.array([
[0.387,4878, 5.42],
[0.723,12104,5.25],
[1,12756,5.52],
[1.524,6787,3.94],
])
pca.fit_transform(x)
輸出:
array([[ -4.25324997e+03, -8.41288672e-01, -8.37858943e-03],
[ 2.97275001e+03, -1.25977271e-01, 1.82476780e-01],
[ 3.62475003e+03, -1.56843494e-01, -1.65224286e-01],
[ -2.34425007e+03, 1.12410944e+00, -8.87390454e-03]])
使用numpy方法
x_std = StandardScaler().fit_transform(x)
cov = np.cov(X.T)
ev , eig = np.linalg.eig(cov)
a = eig.dot(x_std.T)
輸出量
array([[ 1.38252552, -1.25240764, 0.2133338 ],
[-0.53279935, -0.44541231, -0.77988021],
[-0.45230635, 0.21983192, -1.23796328],
[-0.39741982, 1.47798804, 1.80450969]])
我保留了所有3個組件,但似乎不允許我保留原始數據。
我可以知道為什么會這樣嗎?
不要使用StandardScaler
。 相反,只需從x
減去每列的平均值即可:
In [92]: xm = x - x.mean(axis=0)
In [93]: cov = np.cov(xm.T)
In [94]: evals, evecs = np.linalg.eig(cov)
In [95]: xm.dot(evecs)
Out[95]:
array([[ -4.2532e+03, -8.3786e-03, -8.4129e-01],
[ 2.9728e+03, 1.8248e-01, -1.2598e-01],
[ 3.6248e+03, -1.6522e-01, -1.5684e-01],
[ -2.3443e+03, -8.8739e-03, 1.1241e+00]])
最后一個結果包含與sklearn
結果相同的信息,但是列的順序不同。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.