使用sklearn提取PCA组件

Question

I am using sklearn's PCA for dimensionality reduction on a large set of images. 我正在使用sklearn的PCA来减少大量图像的维数。 Once the PCA is fitted, I would like to see what the components look like. 一旦安装了PCA，我想看看组件的外观。

One can do so by looking at the components_ attribute. 可以通过查看components_属性来实现。 Not realizing that was available, I did something else instead: 没有意识到这是可用的，我做了别的事情：

each_component = np.eye(total_components)
component_im_array = pca.inverse_transform(each_component)

for i in range(num_components):
   component_im = component_im_array[i, :].reshape(height, width)
   # do something with component_im

In other words, I create an image in the PCA space that has all features but 1 set to 0. By inversely transforming them, I should then get the image in the original space which, once transformed, can be expressed solely with that PCA component. 换句话说，我在PCA空间中创建了一个具有所有特征但是设置为0的图像。通过对它们进行反变换，我应该在原始空间中获取图像，一旦转换，就可以用该PCA组件单独表示。。

The following image shows the results. 下图显示了结果。 On the left is the component calculated using my method. 左边是使用我的方法计算的组件。 On the right is pca.components_[i] directly. 右边是pca.components_[i] 。 Additionally, with my method, most images are very similar (but they are different) while by accessing the components_ the images are very different as I would have expected 另外，使用我的方法，大多数图像非常相似（但它们是不同的），而通过访问components_ _图像是非常不同的，因为我预期

Is there a conceptual problem in my method? 我的方法中存在概念问题吗？ Clearly the components from pca.components_[i] are correct (or at least more correct) than the ones I'm getting. 很明显， pca.components_[i]中的组件是正确的（或至少更正确），而不是我得到的组件。 Thanks! 谢谢！

left：计算组件，右：真实组件

Answer 1

Components and inverse transform are two different things. 组件和逆变换是两回事。 The inverse transform maps the components back to the original image space 逆变换将组件映射回原始图像空间

#Create a PCA model with two principal components
pca = PCA(2)
pca.fit(data)
#Get the components from transforming the original data.
scores = pca.transform(data)
# Reconstruct from the 2 dimensional scores 
reconstruct = pca.inverse_transform(scores )
#The residual is the amount not explained by the first two components
residual=data-reconstruct

Thus you are inverse transforming the original data and not the components, and thus they are completely different. 因此，您反向转换原始数据而不是组件，因此它们完全不同。 You almost never inverse_transform the orginal data. 你几乎从不反向转换原始数据。 pca.components_ are the actual vectors representing the underlying axis used to project the data to the pca space. pca.components_是表示用于将数据投影到pca空间的基础轴的实际向量。

Answer 2

The difference between grabbing the components_ and doing an inverse_transform on the identity matrix is that the latter adds in the empirical mean of each feature. 抓取components_和对inverse_transform矩阵进行inverse_transform之间的区别在于后者增加了每个特征的经验均值。 Ie: 即：

def inverse_transform(self, X):
    return np.dot(X, self.components_) + self.mean_

where self.mean_ was estimated from the training set. 其中self.mean_是从训练集估计的。

使用sklearn提取PCA组件

问题描述

2 个解决方案

解决方案1
5 2014-03-02 11:31:18

解决方案2
4 已采纳 2014-03-03 08:23:15

使用sklearn提取PCA组件

问题描述

2 个解决方案

解决方案1 5 2014-03-02 11:31:18

解决方案2 4 已采纳 2014-03-03 08:23:15

解决方案1
5 2014-03-02 11:31:18

解决方案2
4 已采纳 2014-03-03 08:23:15