[英]Unable to extract factor loadings from sklearn PCA
I want factor loadings to see which factor loads to which variables. 我希望因子加载查看哪些因子加载到哪些变量。 I am referring to following link: 我指的是以下链接:
Factor Loadings using sklearn 使用sklearn进行因子加载
Here is my code where input_data is the master_data. 这是我的代码,其中input_data是master_data。
X=master_data_predictors.values
#Scaling the values
X = scale(X)
#taking equal number of components as equal to number of variables
#intially we have 9 variables
pca = PCA(n_components=9)
pca.fit(X)
#The amount of variance that each PC explains
var= pca.explained_variance_ratio_
#Cumulative Variance explains
var1=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)
print var1
[ 74.75 85.85 94.1 97.8 98.87 99.4 99.75 100. 100. ]
#Retaining 4 components as they explain 98% of variance
pca = PCA(n_components=4)
pca.fit(X)
X1=pca.fit_transform(X)
print pca.components_
array([[ 0.38454129, 0.37344315, 0.2640267 , 0.36079567, 0.38070046,
0.37690887, 0.32949014, 0.34213449, 0.01310333],
[ 0.00308052, 0.00762985, -0.00556496, -0.00185015, 0.00300425,
0.00169865, 0.01380971, 0.0142307 , -0.99974635],
[ 0.0136128 , 0.04651786, 0.76405944, 0.10212738, 0.04236969,
0.05690046, -0.47599931, -0.41419841, -0.01629199],
[-0.09045103, -0.27641087, 0.53709146, -0.55429524, 0.058524 ,
-0.19038107, 0.4397584 , 0.29430344, 0.00576399]])
import math
loadings = pca.components_.T * math.sqrt(pca.explained_variance_)
It gives me following error 'only length-1 arrays can be converted to Python scalars 它给我以下错误'只有长度为1的数组可以转换为Python标量
I understand the problem. 我了解这个问题。 I have to traverse the pca.components_ and pca.explained_variance_ arrays such as: 我必须遍历pca.components_和pca.explained_variance_数组,例如:
##just a thought
Loading=np.empty((8,4))
for i,j in (pca.components_, pca.explained_variance_):
loading=i*math.sqrt(j)
Loading=Loading.append(loading)
##unable to proceed further
##something wrong here
This is simply a problem of mixing modules. 这仅仅是混合模块的问题。 For numpy arrays, use np.sqrt
instead of math.sqrt
(which only works on single values, not arrays). 对于numpy数组,请使用np.sqrt
而不是math.sqrt
(仅适用于单个值,不适用于数组)。
Your last line should thus read: 因此,您的最后一行应为:
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
This is a mistake in the original answers you linked to. 您链接到的原始答案有误。 I have edited them accordingly. 我已经对它们进行了相应的编辑。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.