使用 SKLearn 和 Python 的 PCA - 使用给定的分量/基向量计算 PCA 值

Question

I'm trying to understand what sklearn is doing when running a PCA .我试图了解sklearn在运行PCA时在做什么。 Unfortunately I don't have much knowledge with PCA so it might be my understand is just wrong.不幸的是，我对PCA了解不多，所以我的理解可能是错误的。

Let's have a simple example with the iris dataset:让我们用 iris 数据集做一个简单的例子：

iris = datasets.load_iris()
X = iris.data
pca.fit(X)
Xfit = pca.transform(X)

Xfit now looks like this: Xfit现在看起来像这样：

[[-2.68412563e+00,  3.19397247e-01, -2.79148276e-02, -2.26243707e-03], ...

I thought that to get these projected values I basically just need to build the dot product of the original values and the transposed basic vectors / components .我认为要获得这些投影值，我基本上只需要构建原始值和转置的basic vectors / components的点积。 So I assumed that this should give the same result:所以我认为这应该给出相同的结果：

np.dot(X, np.transpose(pca.components_))

But unfortunately this is the result:但不幸的是，这是结果：

[[ 2.81823951e+00,  5.64634982e+00, -6.59767544e-01, 3.10892758e-02],..

So my question is:所以我的问题是：

Why is there a difference?为什么会有差异？ I asume the one from pca.transform(X) is correct and I'm doing something wrong but what would I need to do if I only have the components and would like to calculate the principal component values myselfs?我假设来自pca.transform(X)的那个是正确的，我做错了什么但是如果我只有组件并且想自己计算主成分值我需要做什么？

Answer 1

Alright, I've found the issue.好的，我已经找到问题了。 I have to mean-center the raw values before applying np.dot .在应用np.dot之前，我必须使原始值居中。 So when using only pd.DataFrame , which makes mean-centering pretty easy, it looks like this:因此，当仅使用pd.DataFrame时，这使得均值居中变得非常容易，它看起来像这样：

np.dot(pd.DataFrame(X)-pd.DataFrame(X).mean(), np.transpose(pd.DataFrame(pca.components_)))

and the results are the same as when using the fit function:结果与使用拟合 function 时的结果相同：

[[-2.68412563e+00,  3.19397247e-01, -2.79148276e-02, -2.26243707e-03], ...

使用 SKLearn 和 Python 的 PCA - 使用给定的分量/基向量计算 PCA 值

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-03-28 08:35:05

使用 SKLearn 和 Python 的 PCA - 使用给定的分量/基向量计算 PCA 值

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-03-28 08:35:05

解决方案1
0 已采纳 2022-03-28 08:35:05