简体   繁体   English

Sklearn PCA,如何在低维中恢复均值?

[英]Sklearn PCA, how to restore mean in lower dimension?

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.这个问题涉及如何在执行 PCA 后在较低维度中对数据进行去中心化和“恢复”。

I'm doing a simple principal component analysis with sklearn.我正在用 sklearn 做一个简单的主成分分析 As I understand it, the implementation should take care of (1) centering the data when creating components and (2) de-centering the data after transformation.据我了解,实现应该注意(1)在创建组件时将数据居中,以及(2)在转换后将数据去中心化。 However, after transforming the data it is still centered.但是,在转换数据后,它仍然居中。 How can I project the data to a lower dimensional space while preserving the characteristics of the original data?如何将数据投影到低维空间,同时保留原始数据的特征? Given that I would do dimensionality reduction on high dimensional data, I wouldn't have the appropriate mean for each principal component, how can that be derived?鉴于我会对高维数据进行降维,我不会对每个主成分都有适当的均值,如何推导出来?

Reducing 3 dimensions to 2 dimensions:将 3 个维度减少到 2 个维度:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -3], [1, 1, 1], [2, 1, 2], [3, 2, 3]]) + 3
X.shape

(6, 3) (6, 3)

fig = plt.figure(figsize=(10, 8), dpi= 80, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1],X[:,2], '*')
plt.title('original')
plt.show()

在此处输入图片说明

PCA with 2 components:具有 2 个组件的 PCA:

pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_trans.shape

(6, 2) (6, 2)

plt.plot(X_trans[:,0], X_trans[:,1], '*')
plt.show()

在此处输入图片说明

What I would like to do at this stage is to "restore" my data in this lower dimension, such that the value of the data points correspond to the original data.在这个阶段我想做的是在这个较低的维度“恢复”我的数据,使得数据点的值与原始数据相对应。 It should still only have 2 dimensions, but not be centered around the mean.它仍然应该只有 2 个维度,但不能以平均值为中心。

Performing inverse transform, as suggested below, actually brings me back to 3 dimensions执行逆变换,如下所示,实际上让我回到了 3 维

X_approx = pca.inverse_transform(X_trans) 
X_approx.shape

(6, 3) (6, 3)

I want to remain in 2 dimensions but still have my data as resemble it's original form as closely as possible and not be centered around the mean.我想保持二维,但仍然让我的数据尽可能接近原始形式,而不是以平均值为中心。

You are just fitting the data and plotting the transformed data.您只是在拟合数据并绘制转换后的数据。 To get the original data back in a lower dimension, you need to use inverse_transform which gives you the original data back as I show below in the plot.要将原始数据恢复到较低的维度,您需要使用inverse_transform返回原始数据,如下图所示。 From the docs :文档

inverse_transform(X)逆变换(X)

Transform data back to its original space.将数据转换回其原始空间。

pca = PCA(n_components=2)
pca.fit(X)

X_trans =pca.transform(X)
X_original = pca.inverse_transform(X_trans)
plt.plot(X_original[:,0], X_original[:,1], 'r*')

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM