简体   繁体   English

R中带有标签的水平树状图

[英]Horizontal Dendrogram with Labels in R

I am running into an issue where I can plot a vertical dendrogram with labels but I can't add labels when it is horizontal. 我遇到了一个问题,我可以绘制带有标签的垂直树状图,但在水平时不能添加标签。

My Data looks like this: 我的数据如下所示:

Company Industry1 Industry2 Industry3
Google     3%        5%        6%
Apple      2%        6%        1%

When i import the data, the first column contains my Labels but the rows are just 1, 2, 3 etc. 当我导入数据时,第一列包含我的标签,但行仅为1、2、3等。

So my code reads: Data Source Is called Cluster_D 所以我的代码显示为:数据源称为Cluster_D

labs = Cluster_D[, 1]
Industry <- Cluster_D
rownames(Industry) <- labs$`Company`


D.Industry <- dist(scale(round(Industry[, -1], 3)), method = "euclidean")
H.Industry <- hclust(D.Industry, method = "ward.D")
plot(H.Industry, labels = Cluster_D$`Company`)

So i assign my labels to the variable 'Labs". I then place my data into another variable "Industry". Once i plot the data and pass in Labels i get the chart with the clusters I need. The chart works vertically with labels.....but 因此,我将标签分配给变量“ Labs”,然后将数据放入另一个变量“ Industry”。一旦我绘制了数据并传递了Labels,我便获得了所需簇的图表,该图表垂直于标签工作。 ....但

I have no idea how to get this chart flipped to horizontal and to keep the label names. 我不知道如何使此图表翻转为水平并保持标签名称。 I tried to use as.dendrogram function which allows me to use horiz=true but i cant keep my labels, as it reverts back to 1, 2, 3 etc. 我尝试使用as.dendrogram函数,该函数允许我使用as.dendrogram horiz=true但是我无法保留标签,因为它会还原为1、2、3等。

Can anyone explain to me how I can get correct myself? 谁能向我解释我如何才能纠正自己? I am used to use Statistica and i didn't have any issues doing hierarchical clustering, I am trying to pick up R. I feel like it should be super easy to assign labels but I just don't know how. 我曾经使用过Statistica,在进行层次结构聚类时没有任何问题,我正在尝试使用R。我觉得分配标签应该超级容易,但我不知道如何。

i tried using the below, but the charts is mislabeled (ABC order). 我尝试使用下面的方法,但是图表贴错了标签(ABC顺序)。

F.Industries <- as.dendrogram(H.Industry)
labels(F.Industries) <- paste(as.character(Cluster_D[,1]))
plot(F.Industries, horiz = TRUE) 

As requested by PAR: 根据PAR的要求:

data - I added one more column IBM: 数据-我在IBM中又增加了一列:

z <- read.table(text = "Company Industry1 Industry2 Industry3
Google     3%        5%        6%
Apple      2%        6%        1%
IBM        7%        4%        2%", header = T)

When I try: 当我尝试:

scale(round(z[, -1], 3))
#output
Error in Math.data.frame(list(Industry1 = c(2L, 1L, 3L), Industry2 = c(2L,  : 
  non-numeric variable in data frame: Industry1Industry2Industry3

Meaning the sample data you provided is not representative of your own. 意味着您提供的样本数据并不代表您自己的数据。

Convert to numeric: 转换为数字:

z = data.frame("Company" = z[,1], apply(z[,-1], 2, function(x) as.numeric(gsub("%", "", x))))

Row names are labels for the leaves 行名是叶子的标签

rownames(z) <- z[,1]

D.Industry <- dist(scale(z[, -1]), method = "euclidean")
H.Industry <- hclust(D.Industry, method = "ward.D")

plot(as.dendrogram(H.Industry), horiz = T)

在此处输入图片说明

one can adjust the margins with mar 一个人可以用mar来调整边距

par(mar=c(2, 0, 0, 8))
plot(as.dendrogram(H.Industry), horiz = T)

在此处输入图片说明

other approaches include using ape and ggdendro 其他方法包括使用apeggdendro

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM