简体   繁体   English

R-因子分析后的聚类

[英]R - Clustering after factor analysis

Anybody knows how to recreate this data in R ? 有人知道如何在R中重新创建此数据吗? Below is the cluster output that I want to have after doing factor analysis. 以下是我进行因子分析后想要的群集输出。

Cluster centers   Value 1   Value 2   Value 3   Value 4  
FACTOR1            -0.049   -1.481    0.505     0.651    
FACTOR2            0.691    -0.161    -0.633    -0.547      
FACTOR3            0.251    -0.265    0.611     -1.522    
-------------------------------------------------------
No. of case         257       93       174       96       

For my data I have 620 rows of observations and 20 columns of questions, 620x20. 对于我的数据,我有620行观察值和20列问题,即620x20。 I first did factor analysis in R and factorized the 620 rows of observations into 3 factors producing the output as a 20x3 data frame shown below. 我首先在R中进行了因子分析,并将620行观察值分解为3个因子,生成了如下所示的20x3数据帧的输出。

 Matrix   Factor 1   Factor 2   Factor 3   
 Q1       0.646      -0.095     0.041   
 Q2       0.630      0.047      0.124     
 Q3       ...        ...        ...    
 Q4       ...        ...        ...
 ...
 Q20      0.419      0.181      0.337

Next I want to perform cluster analysis on 620 data, where the clusters consider the different factors scores as the output at the top. 接下来,我想对620个数据执行聚类分析,其中聚类将不同因子得分视为顶部的输出。 I am not sure how to do that in R. 我不确定如何在R中执行此操作。

This is an example. 这是一个例子。 I generated a 30x3 matrix, used kmeans clustering specifying that 4 clusters are required. 我生成了一个30x3的矩阵,使用kmeans集群指定需要4个集群。 Note, you can use any other clustering algorithm. 注意,您可以使用任何其他聚类算法。 Then, I calculated the clusters centers (mean by cluster) using aggregate . 然后,我使用aggregate计算了聚类中心(按聚类平均)。 These centers can now be used to apply your classification in a new dataset by finding out, for each sample, what center that sample is closest to (eg, using Euclidean distance). 现在,可以通过为每个样本找出样本最接近的中心(例如,使用欧几里得距离),将这些中心用于在新数据集中应用分类。

set.seed(1); d <- matrix(rnorm(90), ncol=3)
kd <- kmeans(d, centers=4)
cluster <- kd$cluster
dd <- as.data.frame(cbind(d, cluster))
t(aggregate(dd, by=list(dd$cluster), FUN=mean))[c(1,5)*-1,]

         [,1]        [,2]        [,3]       [,4]
V1  0.8321043 -0.01501747 -0.09144934 -1.8916013
V2  0.0121109 -0.51743551  0.85714652 -0.5389448
V3 -0.4478400  0.17132066  0.99685057 -0.9206161

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM