如何在R中对顺序分类数据进行聚类

Question

Consider a data set where users can choose among 3 activities, and we have the data for the choice of their first 10 activities. 考虑一个数据集，用户可以在其中选择3个活动，而我们拥有可供选择的前10个活动的数据。 Example data: 示例数据：

for (i in 1:10) 
{
  # sample from list of 3 strings using a set probability
  x <- sample( c("A", "B", "C"), 1000, replace=TRUE, prob=c(0.5, 0.3, 0.2) )
  # assign to variable created on the fly
  assign( paste("cat", i, sep=""), x )
}

first10 <- data.frame(cat1, cat2, cat3, cat4, cat5, cat6, cat7, cat8, cat9, cat10)

What's the best approach in R to cluster users according to their activity sequence? R中根据用户活动顺序对用户进行聚类的最佳方法是什么？

I've looked around on stackoverflow, and the most similar questions ask about how to cluster categorical data in R (which is part of the analysis), but this in and of itself doesn't account for the sequential nature of the data. 我到处都是stackoverflow，最相似的问题问如何在R中分类数据（这是分析的一部分），但这本身并不能说明数据的顺序性质。 Are there R packages that are well-suited for this analysis? 是否有R软件包非常适合此分析？

Answer 1

Look for frequent itemset mining instead of clustering. 寻找频繁的项集挖掘而不是聚类。

Most clustering methods are for continuous numerical data, and assume some vector field. 大多数聚类方法都是针对连续的数值数据，并假设一些矢量场。 They take every position into account. 他们考虑到每个职位。

A frequent pattern, however, may be only part if a sequence, a sequence may exhibit multiple (or none) of these patterns, and patterns may have gaps inbetween. 但是，频繁的模式可能仅是序列的一部分，序列可能会显示多个（或不显示）这些模式，并且模式之间可能会有间隙。 All of these properties are usually desirable. 所有这些特性通常是理想的。

如何在R中对顺序分类数据进行聚类

问题描述

1 个解决方案

解决方案1
0 2015-08-31 21:47:19

如何在R中对顺序分类数据进行聚类

问题描述

1 个解决方案

解决方案1 0 2015-08-31 21:47:19

解决方案1
0 2015-08-31 21:47:19