将PCA项目重新设置为原始比例，并带有explained_variance_ratio_条件

Question

I have 2 questions concerning PCA when using scikit. 使用scikit时，我有2个关于PCA的问题。

Lets suppose I have the following data: 假设我有以下数据：

fullmatrix =[[2.5, 2.4],
             [0.5, 0.7],
             [2.2, 2.9],
             [1.9, 2.2],
             [3.1, 3.0],
             [2.3, 2.7],
             [2.0, 1.6],
             [1.0, 1.1],
             [1.5, 1.6],
             [1.1, 0.9]]

Now I do the PCA calculations: 现在，我进行PCA计算：

from sklearn.decomposition import PCA as PCA

sklearn_pca = PCA()
Y_sklearn = sklearn_pca.fit_transform(fullmatrix)
print Y_sklearn  # Y_sklearn is now the Data transformed with 2 eigenvectors

sklearn_pca.explained_variance_ratio_  # variance explained by each eigenvector
print sklearn_pca.explained_variance_ratio_

sklearn_pca.components_ # eigenvectors order by highest eigenvalue
print sklearn_pca.components_

First question: How can I project back this Y_sklearn into the original scale? 第一个问题：如何将这个Y_sklearn投影回原始比例？ (I know we should get back the same data as of full matrix as I'm using all eigenvectors, its just to check if done right). （我知道我们应该使用所有特征向量来获取与全矩阵相同的数据，只是为了检查是否正确）。

Second question: How can I enter a threshold regarding minimum acceptable total variance coming from "sklearn_pca.explained_variance_ratio_"?. 第二个问题：如何输入有关“ sklearn_pca.explained_variance_ratio_”的最小可接受总方差的阈值？ For example lets say I want to keep using eigenvectors until when i reach total explained_variance_ratio_ above 95%. 例如，假设我要一直使用特征向量，直到达到95％以上的总explained_variance_ratio_。 In this case is easy, we just use the first eigenvector as it explains .96318131%. 在这种情况下很容易，我们只使用第一个特征向量即可，其解释为0.996318131％。 But how can we do this in a more automated way? 但是，我们如何才能以更自动化的方式做到这一点呢？

Answer 1

First: sklearn_pca.inverse_transform(Y_sklearn) 首先： sklearn_pca.inverse_transform(Y_sklearn)

Second: 第二：

thr = 0.95
# Is cumulative sum exceeds some threshold
is_exceeds = np.cumsum(sklearn_pca.explained_variance_ratio_) >= thr
# Which minimal index provides such variance
# We need to add 1 to get minimum number of eigenvectors for saving this variance
k = np.min(np.where(is_exceeds))+1
# Or you can just initialize your model with thr parameter
sklearn_pca = PCA(n_components = thr)

将PCA项目重新设置为原始比例，并带有explained_variance_ratio_条件

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-10-08 18:38:08

将PCA项目重新设置为原始比例，并带有explained_variance_ratio_条件

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-10-08 18:38:08

解决方案1
1 已采纳 2015-10-08 18:38:08