我应该如何使用它的`.components`编写代码scikit-learn PCA`.transform（）`方法？

Question

How should I write the code scikit-learn PCA .transform() method by using its .components ? 我应该如何写代码scikit学习PCA .transform()通过使用其方法.components ？

I thought the PCA .transform() method transforms a 3D point to 2D Point by just applying a matrix M to the 3D point P like below: 我认为PCA .transform()方法只需将矩阵M应用于3D点P即可将3D点转换为2D点，如下所示：

np.dot(M, P)

To ensure this is correct, I wrote the following code. 为了确保这是正确的，我编写了以下代码。 But, the result was, I couldn't make the same result of the PCA .transform() method. 但是，结果是，我无法获得PCA .transform()方法的相同结果。 How should I modify the code? 我应该如何修改代码？ Am I missing something? 我想念什么吗？

from sklearn.decomposition import PCA
import numpy as np

data3d = np.arange(10*3).reshape(10, 3) ** 2
pca = PCA(n_components=2)
pca.fit(data3d)
pca_transformed2d = pca.transform(data3d)

sample_index = 0
sample3d = data3d[sample_index]

# Manually  transform `sample3d` to 2 dimensions.
w11, w12, w13 = pca.components_[0]
w21, w22, w23 = pca.components_[1]
my_transformed2d = np.zeros(2)
my_transformed2d[0] = w11 * sample3d[0] + w12 * sample3d[1] + w13 * sample3d[2]
my_transformed2d[1] = w21 * sample3d[0] + w22 * sample3d[1] + w23 * sample3d[2]

print("================ Validation ================")
print("pca_transformed2d:", pca_transformed2d[sample_index])
print("my_transformed2d:", my_transformed2d)
if np.all(my_transformed2d == pca_transformed2d[sample_index]):
    print("My transformation is correct!")
else:
    print("My transformation is not correct...")

Output: 输出：

================ Validation ================
pca_transformed2d: [-492.36557212   12.28386702]
my_transformed2d: [ 3.03163093 -2.67255444]
My transformation is not correct...

Answer 1

PCA begins with centering the data: subtracting the average of all observations. PCA首先将数据居中：减去所有观察值的平均值。 In this case, centering is done with 在这种情况下，居中通过

centered_data = data3d - data3d.mean(axis=0)

Averaging out along axis=0 (rows) means only one row will be left, with three components of the mean. 沿轴= 0（行）求平均值意味着仅剩一行，平均值的三个分量。 After centering, multiply the data by the PCA components; 居中后，将数据乘以PCA组件； but instead of writing out matrix multiplication by hand, I'd use .dot : 但是我不用手动写出矩阵乘法，而是使用.dot ：

my_transformed2d = pca.components_.dot(centered_data[sample_index])

Finally, verification. 最后，验证。 Don't use == between floating point numbers; 请勿在浮点数之间使用== ； exact equality is rare. 完全相等的情况很少见。 Tiny discrepancies appear because of a different order of operations somewhere: for example, 由于某些地方的操作顺序不同，因此会出现微小的差异：例如，

0.1 + 0.2 - 0.3 == 0.1 - 0.3 + 0.2

is False. 是错误的。 This is why we have np.allclose , which says "they are close enough". 这就是为什么我们有np.allclose ，它说“它们足够接近”的原因。

if np.allclose(my_transformed2d, pca_transformed2d[sample_index]):
    print("My transformation is correct!")
else:
    print("My transformation is not correct...")

我应该如何使用它的`.components`编写代码scikit-learn PCA`.transform（）`方法？

问题描述

1 个解决方案

解决方案1
2 已采纳

我应该如何使用它的`.components`编写代码scikit-learn PCA`.transform（）`方法？

问题描述

1 个解决方案

解决方案1 2 已采纳

解决方案1
2 已采纳