简体   繁体   English

scikit-learn 内核 PCA 解释方差

[英]scikit-learn kernel PCA explained variance

I have been using the normal PCA from scikit-learn and get the variance ratios for each principal component without any issues.我一直在使用来自 scikit-learn 的普通 PCA 并获得每个主成分的方差比,没有任何问题。

pca = sklearn.decomposition.PCA(n_components=3)
pca_transform = pca.fit_transform(feature_vec)
var_values = pca.explained_variance_ratio_

I know this question is old, but I ran into the same 'problem' and found an easy solution when I realized that the pca.explained_variance_ is simply the variance of the components. 我知道这个问题很老,但是当我意识到pca.explained_variance_只是组件的方差时,我遇到了同样的“问题”并找到了一个简单的解决方案。 You can simply compute the explained variance (and ratio) by doing: 您可以通过执行以下操作来简单地计算解释的方差(和比率):

kpca_transform = kpca.fit_transform(feature_vec)
explained_variance = numpy.var(kpca_transform, axis=0)
explained_variance_ratio = explained_variance / numpy.sum(explained_variance)

and as a bonus, to get the cumulative proportion explained variance (often useful in selecting components and estimating the dimensionality of your space): 并且作为奖励,获得累积比例解释方差(通常用于选择组件和估计空间的维度):

numpy.cumsum(explained_variance_ratio)

The main reason K-PCA does not have explained_variance_ratio_ is because after the kernel transformation of your data/vectors live in different feature space. 主要原因K-PCA没有explained_variance_ratio_是因为你的数据的内核改造后/矢量生活在不同的功能空间。 Hence K-PCA is not supposed to be interpreted like PCA. 因此,K-PCA不应被解释为PCA。

i was intrigued by this as well so i did some testing. 我对此也很感兴趣所以我做了一些测试。 below is my code. 下面是我的代码。

the plots will show that the first component of the kernelpca is a better discriminator of the dataset. 这些图将显示kernelpca的第一个组件是数据集的更好的鉴别器。 however when the explained_variance_ratios are calculated based on @EelkeSpaak explanation, we see only a 50% variance explained ratio which doesnt make sense. 但是,当explain_variance_ratios基于@EelkeSpaak解释计算时,我们只看到50%的方差解释比例,这是没有意义的。 hence it inclines me to agree with @Krishna Kalyan explanation. 因此,我倾向于同意@Krishna Kalyan的解释。

#get data
from sklearn.datasets import make_moons 
import numpy as np
import matplotlib.pyplot as plt

x, y = make_moons(n_samples=100, random_state=123)
plt.scatter(x[y==0, 0], x[y==0, 1], color='red', marker='^', alpha=0.5)
plt.scatter(x[y==1, 0], x[y==1, 1], color='blue', marker='o', alpha=0.5)
plt.show()

##seeing effect of linear-pca-------
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
x_pca = pca.fit_transform(x)

x_tx = x_pca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()

##seeing effect of kernelized-pca------
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
x_kpca = kpca.fit_transform(x)


x_tx = x_kpca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()

##comparing the 2 pcas-------

#get the transformer
tx_pca = pca.fit(x)
tx_kpca = kpca.fit(x)

#transform the original data
x_pca = tx_pca.transform(x)
x_kpca = tx_kpca.transform(x)

#for the transformed data, get the explained variances
expl_var_pca = np.var(x_pca, axis=0)
expl_var_kpca = np.var(x_kpca, axis=0)
print('explained variance pca: ', expl_var_pca)
print('explained variance kpca: ', expl_var_kpca)

expl_var_ratio_pca = expl_var_pca / np.sum(expl_var_pca)
expl_var_ratio_kpca = expl_var_kpca / np.sum(expl_var_kpca)

print('explained variance ratio pca: ', expl_var_ratio_pca)
print('explained variance ratio kpca: ', expl_var_ratio_kpca)

Use the eigen values, it gives similar information and intuition you need:使用特征值,它会提供您需要的类似信息和直觉:

pca = KernelPCA(n_components=30, kernel="rbf")
pca.fit(x)
var_values = pca.eigenvalues_ / sum(pca.eigenvalues_)
print(sum(var_values))
plt.plot(np.arange(1, pca.n_components + 1), var_values, "+", linewidth=2)
plt.ylabel("PCA explained variance ratio")
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM