简体   繁体   中英

Extracting PCA components with sklearn

I am using sklearn's PCA for dimensionality reduction on a large set of images. Once the PCA is fitted, I would like to see what the components look like.

One can do so by looking at the components_ attribute. Not realizing that was available, I did something else instead:

each_component = np.eye(total_components)
component_im_array = pca.inverse_transform(each_component)

for i in range(num_components):
   component_im = component_im_array[i, :].reshape(height, width)
   # do something with component_im

In other words, I create an image in the PCA space that has all features but 1 set to 0. By inversely transforming them, I should then get the image in the original space which, once transformed, can be expressed solely with that PCA component.

The following image shows the results. On the left is the component calculated using my method. On the right is pca.components_[i] directly. Additionally, with my method, most images are very similar (but they are different) while by accessing the components_ the images are very different as I would have expected

Is there a conceptual problem in my method? Clearly the components from pca.components_[i] are correct (or at least more correct) than the ones I'm getting. Thanks!

left:计算组件,右:真实组件

Components and inverse transform are two different things. The inverse transform maps the components back to the original image space

#Create a PCA model with two principal components
pca = PCA(2)
pca.fit(data)
#Get the components from transforming the original data.
scores = pca.transform(data)
# Reconstruct from the 2 dimensional scores 
reconstruct = pca.inverse_transform(scores )
#The residual is the amount not explained by the first two components
residual=data-reconstruct

Thus you are inverse transforming the original data and not the components, and thus they are completely different. You almost never inverse_transform the orginal data. pca.components_ are the actual vectors representing the underlying axis used to project the data to the pca space.

The difference between grabbing the components_ and doing an inverse_transform on the identity matrix is that the latter adds in the empirical mean of each feature. Ie:

def inverse_transform(self, X):
    return np.dot(X, self.components_) + self.mean_

where self.mean_ was estimated from the training set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM