僅numpy與sklearn之間的PCA實施差異

Question

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('data/MNIST/', one_hot=True)

numpy實現

# Entire Data set
Data=np.array(mnist.train.images)
#centering the data
mu_D=np.mean(Data, axis=0)
Data-=mu_D


COV_MA = np.cov(Data, rowvar=False)
eigenvalues, eigenvec=scipy.linalg.eigh(COV_MA, eigvals_only=False)
together = zip(eigenvalues, eigenvec)
together = sorted(together, key=lambda t: t[0], reverse=True)
eigenvalues[:], eigenvec[:] = zip(*together)


n=3
pca_components=eigenvec[:,:n]
print(pca_components.shape)
data_reduced = Data.dot(pca_components)
print(data_reduced.shape)
data_original = np.dot(data_reduced, pca_components.T) # inverse_transform
print(data_original.shape)


plt.imshow(data_original[10].reshape(28,28),cmap='Greys',interpolation='nearest')

sklearn實現

from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(Data)

data_reduced = np.dot(Data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
plt.imshow(data_original[10].reshape(28,28),cmap='Greys',interpolation='nearest')

我想通過使用numpy來實現PCA算法。 但是，我不知道如何從中重建圖像，甚至不知道此代碼是否正確。

實際上，當我使用sklearn.decomposition.PCA ，結果與numpy實現不同。

您能解釋這些差異嗎？

Answer 1

我已經可以發現一些差異。

對於一個：

n=300
projections = only_2.dot(eigenvec[:,:n])
Xhat = np.dot(projections, eigenvec[:,:n].T)
Xhat += mu_D
plt.imshow(Xhat[5].reshape(28,28),cmap='Greys',interpolation='nearest')

我要提出的觀點是，如果我的理解是正確的n = 300 ，則您正在嘗試擬合特征值從高到低的300個特征向量。

但是在sklearn

from sklearn.decomposition import PCA

pca = PCA(n_components=1)
pca.fit(only_2)

data_reduced = np.dot(only_2, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # invers

在我看來，您僅適合FIRST組件（使方差最大化的組件），而您並沒有全部使用300。

更多：

我可以明確地說的一件事是，您似乎了解PCA中發生的事情，但是在實現它時遇到了麻煩。 如果我錯了，請糾正我，但：

data_reduced = np.dot(only_2, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform

在這一部分中，您試圖將特征向量sklearn到數據中，這是您在PCA中應該做的，但是在sklearn ，您應該做的是：

 import numpy as np
 from sklearn.decomposition import PCA

 pca = PCA(n_components=300)
 pca.fit_transform(only_2)

如果您能告訴我您是如何創建only_2 ，明天我可以給您一個更具體的答案。

fit_transform關於PCA的sklearn描述fit_transform ： http : fit_transform ：

fit_transform(X, y=None)
Fit the model with X and apply the dimensionality reduction on X.

Parameters: 
X : array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.

y : Ignored
Returns:    
X_new : array-like, shape (n_samples, n_components)

僅numpy與sklearn之間的PCA實施差異

問題描述

1 個解決方案

解決方案1
1 2018-09-30 05:25:42

僅numpy與sklearn之間的PCA實施差異

問題描述

1 個解決方案

解決方案1 1 2018-09-30 05:25:42

解決方案1
1 2018-09-30 05:25:42