Determine the value of n_components variable in pca analysis

Question

Have a nice day. Please help me. I have a normalized file. This file consists of 21 numeric columns.

I will apply pca analysis to this file as below :

pca = decomposition.PCA(n_components=21)
pca_output = pca.fit_transform(pca_matrix)
pca_inverse = pca.inverse_transform(pca_output)

As far as I understand, the value I assign to the n_components variable is equal to the number of columns. But what I do not understand is how do I determine the n_components variable.

Answer 1

It is a hyperparameter and finding its optimal value depends on what you want to do with your data. Let me describe 3 possible uses:

Visualization : 2 or 3 are probably the most sensible options:)
Compression : Here the goal is to simply decrease the number of features without loosing too much information. You can fit all components ( n_components=None ). Then inspect the attribute explained_variance_ratio_ and decide how many you are willing to drop. Or you can put n_components='mle' and let the data decide for you.
Preprocessing : Here the dimensionality reduction is a first step of some pipepline (preceding regression/classification). As opposed to compression, you want to use the transformed features as input to a supervised learning algorithm. I would recommend finding the optimal n_components through a GridSearchCV over both the PCA's n_components and the predictive model's hyperparameters.

Determine the value of n_components variable in pca analysis

Question

1 answers

solution1
1 2018-05-10 14:48:16

Determine the value of n_components variable in pca analysis

Question

1 answers

solution1 1 2018-05-10 14:48:16

solution1
1 2018-05-10 14:48:16