如何使用分类和数值数据is R创建分层集群？

Question

I want to create a hierarchical cluster to show types of careers and the balance that those who are in those careers have in their bank account. 我想创建一个层次结构集群，以显示职业类型以及从事这些职业的人在银行帐户中的余额。 I a dataset with two variables, job and balance: 我是一个具有两个变量的数据集，即工作和平衡：

              job balance
1       unemployed    1787
2         services    4789
3       management    1350
4       management    1476
5      blue-collar       0
6       management     747
7    self-employed     307
8       technician     147
9     entrepreneur     221
10        services     -88

I want the result to look like this: 我希望结果看起来像这样：

Where A, B ,C etc are the job categories. 其中A，B，C等是职位类别。

Can anyone help me start this or give me some help? 谁能帮我开始这个工作或给我一些帮助？

I have no idea how to begin. 我不知道如何开始。

Thanks! 谢谢！

Answer 1

You can start by using the dist and hclust functions. 您可以使用dist和hclust函数开始。

df <- read.table(text = "              job balance
1       unemployed    1787
2         services    4789
3       management    1350
4       management    1476
5      blue-collar       0
6       management     747
7    self-employed     307
8       technician     147
9     entrepreneur     221
10        services     -88")

dist computes the distance between each element (by default, the euclidian distance): dist计算每个元素之间的距离（默认情况下为欧几里得距离）：

distances <- dist(df$balance)

You can then cluster you values using the distance matrix generated above: 然后，您可以使用上面生成的距离矩阵对值进行聚类：

clusters <- hclust(distances)

By default, hclust applies complete-linkage clustering to your data. 默认情况下，hclust将完全链接群集应用于您的数据。 Finally, you can plot your results as a tree: 最后，您可以将结果绘制成一棵树：

plot(clusters, labels = df$job)

Here, we clustered all the entries in your data frame, that's why some jobs are duplicated. 在这里，我们将您数据框中的所有条目聚集在一起，这就是为什么某些作业重复的原因。 If you want to have a single value per job, you can for example take the mean balance for each job using tapply : 如果您希望每个作业只有一个值，则可以使用tapply获取每个作业的平均余额：

means <- tapply(df$balance, df$job, mean)

And then cluster the jobs: 然后将作业聚类：

distances <- dist(means)
clusters <- hclust(distances)
plot(clusters)

You can then try to use other distance measures or other clustering algorithms (see help(dist) and help(hclust) for other methods). 然后，您可以尝试使用其他距离度量或其他聚类算法（有关其他方法，请参见help(dist)和help(hclust) ）。

如何使用分类和数值数据is R创建分层集群？

问题描述

1 个解决方案

解决方案1
0 2018-10-17 14:28:04

如何使用分类和数值数据is R创建分层集群？

问题描述

1 个解决方案

解决方案1 0 2018-10-17 14:28:04

解决方案1
0 2018-10-17 14:28:04