如何在受监督的ML中解释PCA结果

Question

So I have a data set of 700 texts annotated by difficulty levels. 因此，我有一个包含700个文本的数据集，并用难度级别进行了注释。 Each text has 150 features: 每个文本具有150个功能：

    feature_names = ['F1','F2','F3'...] shape (1, 150)
    features_x = ['0.1','0,765', '0.543'...] shape (700, 150)
    correct_answers_y = ['1','2','4'...] shape (1,700)

I want to use PCA to find out the most informative sets of features, something like: 我想使用PCA找出功能最丰富的功能集，例如：

    Component1 = 0,76*F1+0.11*F4-0.22*F7

How can I do so? 我该怎么办？ The code from sklearn user gide have some numbers as output, but I don`t understand how to unterpret them. sklearn用户助手的代码有一些数字作为输出，但是我不理解如何理解它们。

    fit_xy = pca.fit(features_x,correct_answers_y)
    array([  4.01783322e-01,   1.98421989e-01,  3.08468655e-01,
     4.28813755e-02, ...])

Answer 1

Not sure where that array comes from, but it looks like the output of explained_variance_ or explained_variance_ratio_ attributes. 不确定该数组来自何处，但看起来像explained_variance_或explained_variance_ratio_属性的输出。 They are as they say; 他们如他们所说； explained variance and explained variance ratio compared to your data. 与数据相比的解释方差和解释方差比。 Usually when doing a PCA you're defining a minimum of ratio of variance you want to keep from the data. 通常，在进行PCA时，您要定义要从数据中保留的最小方差比。

Lets say you want to keep at least 90% of the variance in your data. 假设您要在数据中保留至少90％的差异。 Here's code to find how many principle components ( n_components parameter in PCA) you need: 以下代码可查找您需要多少个主要组件（PCA中的n_components参数）：

    pca_cumsum = pca.explained_variance_ratio_.cumsum()
    pca_cumsum
    >> np.array([.54, .79, .89, .91, .97, .99, 1])
    np.argmax(pca_cumsum >= 0.9)
    >> 3

And as desertnaut said; 正如desertnaut所说： labels will be ignored, as it is not used in PCA. 标签将被忽略，因为它在PCA中未使用。

如何在受监督的ML中解释PCA结果

问题描述

1 个解决方案

解决方案1
-1 2018-02-14 03:26:44

如何在受监督的ML中解释PCA结果

问题描述

1 个解决方案

解决方案1 -1 2018-02-14 03:26:44

解决方案1
-1 2018-02-14 03:26:44