如何在 sklearn PCA 中获得贡献和平方余弦？

Question

Working primarily based on this paper I want to implement the various PCA interpretation metrics mentioned - for example cosine squared and what the article calls contribution.主要基于本文工作，我想实现提到的各种 PCA 解释指标——例如余弦平方和文章所说的贡献。

However the nomenclature here seems very confusing, namely it's not clear to me what exactly sklearns pca.components_ is.然而这里的命名法似乎很混乱，即我不清楚 sklearns pca.components_是什么。 I've seen some answers here and in various blogs stating that these are loadings while others state it's component scores (which I assume is the same thing as factor scores).我在这里和各种博客中看到了一些答案，指出这些是负载，而其他 state 它是组件分数（我认为这与因子分数相同）。

The paper defines contribution (of observation to component) as:该论文将贡献（对组件的观察）定义为：

ctr

and states all contributions for each component must add to 1, which is not the case assuming pca.explained_variance_ is the eigenvalues and pca.components_ are the factor scores:并声明每个分量的所有贡献必须加到 1，假设pca.explained_variance_是特征值并且pca.components_是因子得分，情况并非如此：

df = pd.DataFrame(data = [
[0.273688,0.42720,0.65267],
[0.068685,0.008483,0.042226],
[0.137368, 0.025278,0.063490],
[0.067731,0.020691,0.027731],
[0.067731,0.020691,0.027731]
], columns = ["MeS","EtS", "PrS"])

pca = PCA(n_components=2)
X = pca.fit_transform(df)
ctr=(pd.DataFrame(pca.components_.T**2)).div(pca.explained_variance_)
np.sum(ctr,axis=0)
# Yields random values 0.498437 and 0.725048

How can I calculate these metrics?如何计算这些指标？ The paper defines cosine squared similarly as:该论文将余弦平方定义为：

ctr

Answer 1

This paper does not play well with sklearn as far as definitions are concerned.就定义而言，这篇论文与 sklearn 配合得不好。

The pca.components_ are the two principal components of your data after your data is centered. pca.components_是数据居中后数据的两个主要组成部分。 And pca.fit_transform(df) gives you the components of your centered data set w.r.t. pca.fit_transform(df)为您提供居中数据集 w.r.t 的组件。 those two principal components, ie, the factor scores.这两个主成分，即因子得分。

> pca.fit_transform(df)
array([[ 0.60781787, -0.00280834],
       [-0.1601333 , -0.01246807],
       [-0.11667497,  0.04584743],
       [-0.1655048 , -0.01528551],
       [-0.1655048 , -0.01528551]])

Next, the lambda_l of equation (10) in the paper is just the sum of the squares of the factor scores for the l-th component, ie l-th column of pca.fit_transform(df) .接下来，论文中方程（10）的 lambda_l 只是第 l 个分量的因子分数的平方和，即pca.fit_transform(df)的第 l 列。 But pca.explained_variance_ gives you the two variances , and since sklearn uses as degrees of freedom the value len(df.index) - 1 , we have lambda_l == (len(df.index)-1) pca.explained_variance_[l] .但是pca.explained_variance_给了你两个方差，并且由于 sklearn 使用值len(df.index) - 1作为自由度，我们有lambda_l == (len(df.index)-1) pca.explained_variance_[l] .

> X = pca.fit_transform(df)
> lmbda = np.sum(X**2, axis = 0)
> lmbda
array([0.46348196, 0.00273262])

> (5-1) * pca.explained_variance_
array([0.46348196, 0.00273262])

Thus, as a summary, I recommend computing the contributions as:因此，作为总结，我建议将贡献计算为：

> ctr = X**2 / np.sum(X**2, axis = 0)

For the squared cosine it's the same except that we sum over the rows of pca.fit_transform(df) :对于平方余弦，它是相同的，除了我们对pca.fit_transform(df)的行求和：

> cos_sq = X**2 / np.sum(X**2, axis = 1)[:, np.newaxis]

如何在 sklearn PCA 中获得贡献和平方余弦？

问题描述

1 个解决方案

解决方案1
0 2022-08-01 15:56:04

如何在 sklearn PCA 中获得贡献和平方余弦？

问题描述

1 个解决方案

解决方案1 0 2022-08-01 15:56:04

解决方案1
0 2022-08-01 15:56:04