[英]Determine the value of n_components variable in pca analysis
Have a nice day. 祝你今天愉快。 Please help me. 请帮我。 I have a normalized file. 我有一个规范化的文件。 This file consists of 21 numeric columns. 该文件由21个数字列组成。
I will apply pca analysis
to this file as below : 我将对以下文件进行pca analysis
:
pca = decomposition.PCA(n_components=21)
pca_output = pca.fit_transform(pca_matrix)
pca_inverse = pca.inverse_transform(pca_output)
As far as I understand, the value I assign to the n_components
variable is equal to the number of columns. 据我了解,我分配给n_components
变量的值等于列数。 But what I do not understand is how do I determine the n_components
variable. 但是我不明白的是如何确定n_components
变量。
It is a hyperparameter and finding its optimal value depends on what you want to do with your data. 它是一个超参数,找到最佳值取决于要对数据执行的操作。 Let me describe 3 possible uses: 让我描述3种可能的用途:
n_components=None
). 您可以容纳所有组件( n_components=None
)。 Then inspect the attribute explained_variance_ratio_
and decide how many you are willing to drop. 然后检查属性explained_variance_ratio_
并确定您愿意删除多少个。 Or you can put n_components='mle'
and let the data decide for you. 或者,您可以放置n_components='mle'
并让数据为您决定。 n_components
and the predictive model's hyperparameters. 我建议通过GridSearchCV在PCA的n_components
和预测模型的超参数上找到最佳的n_components。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.