SKLearn PCA explained_variance_ration cumsum 给出数组 1

Question

I have a problem with PCA.我对 PCA 有疑问。 I read that PCA needs clean numeric values.我读到 PCA 需要干净的数值。 I started my analysis with a dataset called trainDf with shape (1460, 79) .我从一个名为trainDf的数据集开始我的分析，其形状为(1460, 79) 。

I did my data cleaning and processing by removing empty values, imputing and dropping columns and I got a dataframe transformedData with shape (1458, 69) .我通过删除空值、输入和删除列来清理和处理数据，我得到了一个形状为(1458, 69)的 dataframe transformedData 。

Data cleaning steps are:数据清洗步骤为：

LotFrontage imputing with mean value LotFrontage平均值
MasVnrArea imputing with 0s (less than 10 cols) MasVnrArea以 0 进行插补（小于 10 列）
Ordinal encoding for categorical columns分类列的顺序编码
Electrical imputing with most frequent value具有最频繁值的Electrical插补

I found outliers with IQR and got withoutOutliers with shape (1223, 69) .我发现了带有withoutOutliers的离群值，并且没有形状为(1223, 69)的离群值。

After this, I looked at histograms and decided to apply PowerTransformer on some features and StandardScaler on others and I got normalizedData .在此之后，我查看了直方图并决定在某些特征上应用PowerTransformer ，在其他特征上应用StandardScaler ，我得到了normalizedData 。

Now I tried doing PCA and I got this:现在我尝试做 PCA，我得到了这个：

pca = PCA().fit(transformedData)

print(pca.explained_variance_ratio_.cumsum())

plt.plot(pca.explained_variance_ratio_.cumsum())
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

the output of this PCA is the following:此 PCA 的 output 如下：

[0.67454179 0.8541084  0.98180307 0.99979932 0.99986346 0.9999237
 0.99997091 0.99997985 0.99998547 0.99999044 0.99999463 0.99999719
 0.99999791 0.99999854 0.99999909 0.99999961 0.99999977 0.99999988
 0.99999994 0.99999998 0.99999999 1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.        ]

Then I tried:然后我尝试了：

pca = PCA().fit(withoutOutliers)

print(pca.explained_variance_ratio_.cumsum())

plt.plot(pca.explained_variance_ratio_.cumsum())
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

out:出去：

[0.68447278 0.86982875 0.99806386 0.99983727 0.99989606 0.99994353
 0.99997769 0.99998454 0.99998928 0.99999299 0.9999958  0.99999775
 0.99999842 0.99999894 0.99999932 0.99999963 0.9999998  0.9999999
 0.99999994 0.99999998 0.99999999 1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.        ]

Finally:最后：

pca = PCA().fit(normalizedData)

print(pca.explained_variance_ratio_.cumsum())

plt.plot(pca.explained_variance_ratio_.cumsum())
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

Out:出去：

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

How is it possible that the last execution gives such an output?最后执行怎么可能给出这样的output呢？

Here are data distributions下面是数据分布

transformedData

withoutOutliers

normalizedData

I'll add any further data if necessary, thanks in advance to any who can help!如有必要，我会添加任何进一步的数据，在此先感谢任何可以提供帮助的人！

Answer 1

In short, all data should be scaled before applying PCA (for example using a StandardScaler ).简而言之，所有数据都应该在应用 PCA 之前进行缩放（例如使用StandardScaler ）。

I got the answer on Data science stackexchange .我在Data science stackexchange上得到了答案。

SKLearn PCA explained_variance_ration cumsum 给出数组 1

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-02-07 22:59:02

SKLearn PCA explained_variance_ration cumsum 给出数组 1

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-02-07 22:59:02

解决方案1
0 已采纳 2022-02-07 22:59:02