简体   繁体   中英

PCA analysis considering N-less relevant components

I am trying to learn the basics of PCA analysis in Python using scikit libraries (in particular sklearn.decomposition and sklearn.preprocessing). The goal is to import data from images into a matrix X (each row is a sample, each column is a feature), then standardize X, use PCA to extract principal components (2 most important, 6 most important....6 less important), project X on these principal components, reverse the previous transformation and plot the result in order to see the difference with respect to the original image/images.

Now let's say that I do not want to consider the 2,3,4... most important principal components but I want to consider the N less relevant components, let's say N=6.

How should the analysis be done? I mean I can't simply standardize then call PCA().fit_transform and then revert back with inverse_transform() to plot the results.

At the moment I am doing something like this:

X_std = StandardScaler().fit_transform(X) # standardize original data
pca = PCA()
model = pca.fit(X_std) # create model with all components
Xprime = model.components_[range(dim-6, dim, 1),:] # get last 6 PC

And then I stop because I know I should call transform() but I do not understand how to do it...I tried several times withouth being successfull.

Is there someone that can tell me if previous steps are correct and point out the direction to follow?

Thank you very much


EDIT: currently I have adapted this solution as suggested by the first answer to my question:

model = PCA().fit(X_std)
model2pc = model 
model2pc.components_[range(2, img_count, 1), :] = 0
Xp_2pc = model2pc.transform(X_std)
Xr_2pc = model2pc.inverse_transform(Xp_2pc)

And then I do the same for 6pc, 60pc, last 6 pc. What I have noticed is that this is very time consuming. I would like to get a model directly extracting the principal components I need (without zeroing out the others) and then perform transform() and inverse_transform() on that with that model.

If you want to ignore all but the last 6 principal components, you can just zero out the ones you don't want to keep.

N = 6
X_std = StandardScaler().fit_transform(X)
pca = PCA()
model = pca.fit(X_std) # create model with all components
model.components_[:-N] = 0

Then, to remove all but the last N components from the data, just do a forward and inverse transform of the data:

Xprime = model.inverse_transform(model.transform(X_std))

Here is an example:

>>> X = np.random.rand(18).reshape(6, 3)
>>> model = PCA().fit(X)

A round-trip transform should give back the original data:

>>> X
array([[0.16594796, 0.02366958, 0.8403745 ],
       [0.25219425, 0.22879029, 0.07950927],
       [0.69636084, 0.4410933 , 0.97431828],
       [0.50121079, 0.44835563, 0.95236146],
       [0.6793044 , 0.53847562, 0.27882302],
       [0.32886931, 0.0643043 , 0.10597973]])
>>> model.inverse_transform(model.transform(X))
array([[0.16594796, 0.02366958, 0.8403745 ],
       [0.25219425, 0.22879029, 0.07950927],
       [0.69636084, 0.4410933 , 0.97431828],
       [0.50121079, 0.44835563, 0.95236146],
       [0.6793044 , 0.53847562, 0.27882302],
       [0.32886931, 0.0643043 , 0.10597973]])

Now zero out the first principal component:

>>> model.components_
array([[ 0.22969899,  0.21209762,  0.94986998],
       [-0.67830467, -0.66500728,  0.31251894],
       [ 0.69795497, -0.71608653, -0.0088847 ]])
>>> model.components_[:-2] = 0
>>> model.components_
array([[ 0.        ,  0.        ,  0.        ],
       [-0.67830467, -0.66500728,  0.31251894],
       [ 0.69795497, -0.71608653, -0.0088847 ]])

The round-trip transform now gives a different result since we've removed the first principal component (which contains the greatest amount of variance):

>>> model.inverse_transform(model.transform(X))
array([[ 0.12742811, -0.01189858,  0.68108405],
       [ 0.36513945,  0.33308073,  0.54656949],
       [ 0.58029482,  0.33392119,  0.49435263],
       [ 0.39987803,  0.35478779,  0.53332196],
       [ 0.71114004,  0.56787176,  0.41047233],
       [ 0.44000711,  0.16692583,  0.56556581]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM