确定pca分析中n_components变量的值

Question

Have a nice day. 祝你今天愉快。 Please help me. 请帮我。 I have a normalized file. 我有一个规范化的文件。 This file consists of 21 numeric columns. 该文件由21个数字列组成。

I will apply pca analysis to this file as below : 我将对以下文件进行pca analysis ：

pca = decomposition.PCA(n_components=21)
pca_output = pca.fit_transform(pca_matrix)
pca_inverse = pca.inverse_transform(pca_output)

As far as I understand, the value I assign to the n_components variable is equal to the number of columns. 据我了解，我分配给n_components变量的值等于列数。 But what I do not understand is how do I determine the n_components variable. 但是我不明白的是如何确定n_components变量。

Answer 1

It is a hyperparameter and finding its optimal value depends on what you want to do with your data. 它是一个超参数，找到最佳值取决于要对数据执行的操作。 Let me describe 3 possible uses: 让我描述3种可能的用途：

Visualization : 2 or 3 are probably the most sensible options:) 可视化 ：2或3可能是最明智的选择：）
Compression : Here the goal is to simply decrease the number of features without loosing too much information. 压缩：这里的目标是在不丢失太多信息的情况下简单地减少功能部件的数量。 You can fit all components ( n_components=None ). 您可以容纳所有组件（ n_components=None ）。 Then inspect the attribute explained_variance_ratio_ and decide how many you are willing to drop. 然后检查属性explained_variance_ratio_并确定您愿意删除多少个。 Or you can put n_components='mle' and let the data decide for you. 或者，您可以放置n_components='mle'并让数据为您决定。
Preprocessing : Here the dimensionality reduction is a first step of some pipepline (preceding regression/classification). 预处理 ：这里降维是某些管线的第一步（在回归/分类之前）。 As opposed to compression, you want to use the transformed features as input to a supervised learning algorithm. 与压缩相反，您想将转换后的特征用作监督学习算法的输入。 I would recommend finding the optimal n_components through a GridSearchCV over both the PCA's n_components and the predictive model's hyperparameters. 我建议通过GridSearchCV在PCA的n_components和预测模型的超参数上找到最佳的n_components。

确定pca分析中n_components变量的值

问题描述

1 个解决方案

解决方案1
1 2018-05-10 14:48:16

确定pca分析中n_components变量的值

问题描述

1 个解决方案

解决方案1 1 2018-05-10 14:48:16

解决方案1
1 2018-05-10 14:48:16