聚类变量

Question

What are some proven methods for finding groupings of highly correlated variables within a large, high-dimensional binary dataset (think 200,000+ rows and 150+ fields) that can be easily implemented in R? 有什么成熟的方法可以在R中轻松实现的大型高维二进制数据集（例如200,000+行和150+字段）中查找高度相关变量的分组？ I want to find groupings of variables which lends itself to interpretation so I don't think PCA would be the best method. 我想找到便于解释的变量分组，所以我认为PCA并不是最好的方法。

Answer 1

    library(Hmisc)
mtc <- mtcars[,2:8]
    mtcn <- data.matrix(mtc)
    clust <- varclus(mtcn)
    clust
    plot(clust)

?varclus : Does a hierarchical cluster analysis on variables, using the Hoeffding D statistic, squared Pearson or Spearman correlations, or proportion of observations for which two variables are both positive as similarity measures. ?varclus :是否使用Hoeffding D统计量，平方Pearson或Spearman相关系数或两个变量均为正的观测值比例作为相似性度量，对变量进行层次聚类分析。 Variable clustering is used for assessing collinearity, redundancy, and for separating variables into clusters that can be scored as a single variable, thus resulting in data reduction. 变量聚类用于评估共线性，冗余度，以及将变量分为可计为单个变量的聚类，从而导致数据减少。

For Binary Vraibles: 对于二进制变量：

library(cluster)
data(animals)
ma <- mona(animals)
ma

plot(ma)

?mona : Returns a list representing a divisive hierarchical clustering of a dataset with binary variables only. ?mona :返回一个列表，该列表表示仅具有二进制变量的数据集的划分性分层聚类。

聚类变量

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-01-29 12:59:48

聚类变量

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-01-29 12:59:48

解决方案1
1 已采纳 2014-01-29 12:59:48