简体   繁体   中英

How to interpret Scikit-learn's PCA when n_components are None?

I'm confused with the problem mentioned in the title. Does n_components=None mean that no transformation has been made in input, or that it has been transformed to new dimensional space but instead of the usual "reduction" (keeping few components with high eigenvalues) with keeping all the new synthetic features? The documentation suggests the former for me:

Hence, the None case results in: n_components == min(n_samples, n_features) - 1

But this is not entirely clear, and additionally, if it indeed means keeping all the components, why on earth the number of these equals to n_components == min(n_samples, n_features) - 1 , why not to n_features ?

However, I find the other alternative (in case of None, dropping the whole PCA step), I have never heard about applying PCA without omitting some eigenvectors...

As per official documentation -

If svd_solver == 'arpack', the number of components must be strictly less than the minimum of n_features and n_samples. Hence, the None case results in: n_components == min(n_samples, n_features) - 1

So it depends upon the type of solver (which can be set via the parameter) being used for the eigenvectors.

If arpack: run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

As to your second query about dropping the whole PCA step, it depends totally upon what you are trying to solve. Since PCA components explain the variation of the data in decreasing order (1st component explains max variance, the last component explains the least variance), it can be useful for specific tasks to have some features that explain more variance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM