简体繁体 English

当 n_components 为 None 时如何解释 Scikit-learn 的 PCA？

[英]How to interpret Scikit-learn's PCA when n_components are None?

原文 2020-08-03 09:07:55 0 1 python/ scikit-learn/ pca/ dimensionality-reduction

I'm confused with the problem mentioned in the title.我对标题中提到的问题感到困惑。 Does n_components=None mean that no transformation has been made in input, or that it has been transformed to new dimensional space but instead of the usual "reduction" (keeping few components with high eigenvalues) with keeping all the new synthetic features? n_components=None是否意味着在输入中没有进行任何转换，或者它已经转换到新的维度空间，而不是通常的“减少”（保留少数具有高特征值的组件）并保留所有新的合成特征？ The documentation suggests the former for me: 文档为我建议了前者：

Hence, the None case results in: n_components == min(n_samples, n_features) - 1因此， None 情况导致： n_components == min(n_samples, n_features) - 1

But this is not entirely clear, and additionally, if it indeed means keeping all the components, why on earth the number of these equals to n_components == min(n_samples, n_features) - 1 , why not to n_features ?但这并不完全清楚，此外，如果它确实意味着保留所有组件，为什么这些组件的数量等于n_components == min(n_samples, n_features) - 1 ，为什么不n_features ？

However, I find the other alternative (in case of None, dropping the whole PCA step), I have never heard about applying PCA without omitting some eigenvectors...但是，我找到了另一种选择（如果没有，则放弃整个 PCA 步骤），我从未听说过在不省略一些特征向量的情况下应用 PCA ......

1 个解决方案

As per official documentation -根据官方文档 -

If svd_solver == 'arpack', the number of components must be strictly less than the minimum of n_features and n_samples.如果 svd_solver == 'arpack'，组件的数量必须严格小于 n_features 和 n_samples 的最小值。 Hence, the None case results in: n_components == min(n_samples, n_features) - 1因此， None 情况导致： n_components == min(n_samples, n_features) - 1

So it depends upon the type of solver (which can be set via the parameter) being used for the eigenvectors.所以它取决于用于特征向量的求解器的类型（可以通过参数设置）。

If arpack: run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds.如果 arpack：运行 SVD 截断为 n_components，通过 scipy.sparse.linalg.svds 调用 ARPACK 求解器。 It requires strictly 0 < n_components < min(X.shape)它严格要求 0 < n_components < min(X.shape)

As to your second query about dropping the whole PCA step, it depends totally upon what you are trying to solve.至于您关于删除整个 PCA 步骤的第二个查询，这完全取决于您要解决的问题。 Since PCA components explain the variation of the data in decreasing order (1st component explains max variance, the last component explains the least variance), it can be useful for specific tasks to have some features that explain more variance.由于 PCA 组件按降序解释数据的变化（第一个组件解释最大方差，最后一个组件解释最小方差），因此对于特定任务来说，具有一些解释更多方差的特征可能很有用。