简体   繁体   English


[英]Extracting PCA components with sklearn

I am using sklearn's PCA for dimensionality reduction on a large set of images. 我正在使用sklearn的PCA来减少大量图像的维数。 Once the PCA is fitted, I would like to see what the components look like. 一旦安装了PCA,我想看看组件的外观。

One can do so by looking at the components_ attribute. 可以通过查看components_属性来实现。 Not realizing that was available, I did something else instead: 没有意识到这是可用的,我做了别的事情:

each_component = np.eye(total_components)
component_im_array = pca.inverse_transform(each_component)

for i in range(num_components):
   component_im = component_im_array[i, :].reshape(height, width)
   # do something with component_im

In other words, I create an image in the PCA space that has all features but 1 set to 0. By inversely transforming them, I should then get the image in the original space which, once transformed, can be expressed solely with that PCA component. 换句话说,我在PCA空间中创建了一个具有所有特征但是设置为0的图像。通过对它们进行反变换,我应该在原始空间中获取图像,一旦转换,就可以用该PCA组件单独表示。 。

The following image shows the results. 下图显示了结果。 On the left is the component calculated using my method. 左边是使用我的方法计算的组件。 On the right is pca.components_[i] directly. 右边是pca.components_[i] Additionally, with my method, most images are very similar (but they are different) while by accessing the components_ the images are very different as I would have expected 另外,使用我的方法,大多数图像非常相似(但它们不同的),而通过访问components_ _图像是非常不同的,因为我预期

Is there a conceptual problem in my method? 我的方法中存在概念问题吗? Clearly the components from pca.components_[i] are correct (or at least more correct) than the ones I'm getting. 很明显, pca.components_[i]中的组件是正确的(或至少更正确),而不是我得到的组件。 Thanks! 谢谢!


Components and inverse transform are two different things. 组件和逆变换是两回事。 The inverse transform maps the components back to the original image space 逆变换将组件映射回原始图像空间

#Create a PCA model with two principal components
pca = PCA(2)
#Get the components from transforming the original data.
scores = pca.transform(data)
# Reconstruct from the 2 dimensional scores 
reconstruct = pca.inverse_transform(scores )
#The residual is the amount not explained by the first two components

Thus you are inverse transforming the original data and not the components, and thus they are completely different. 因此,您反向转换原始数据而不是组件,因此它们完全不同。 You almost never inverse_transform the orginal data. 你几乎从不反向转换原始数据。 pca.components_ are the actual vectors representing the underlying axis used to project the data to the pca space. pca.components_是表示用于将数据投影到pca空间的基础轴的实际向量。

The difference between grabbing the components_ and doing an inverse_transform on the identity matrix is that the latter adds in the empirical mean of each feature. 抓取components_和对inverse_transform矩阵进行inverse_transform之间的区别在于后者增加了每个特征的经验均值。 Ie: 即:

def inverse_transform(self, X):
    return np.dot(X, self.components_) + self.mean_

where self.mean_ was estimated from the training set. 其中self.mean_是从训练集估计的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM