简体   繁体   English

使树形图在 R 中更具可读性

[英]Make dendrograms more readable in R

I am working with 1800 observations to classify them.我正在使用 1800 个观察结果对它们进行分类。 I apply a dendrogram analysis in which I represent the data.我应用树状图分析来表示数据。 I identify three groups.我确定了三组。 The problem comes when it comes to visualizing the data.当涉及到数据可视化时,问题就来了。 They are not readable.它们不可读。 At the bottom, there is a lot of overlapping data.在底部,有很多重叠的数据。 The labels are numbers, but I don't know how I managed to make them more readable.标签是数字,但我不知道如何使它们更具可读性。 I have tried two options and neither is fruitful.我尝试了两种选择,但都没有结果。

Option 1:选项1:

m  <- as.matrix(dtm)

distMatrix <- dist(m, method="euclidean")

groups <- hclust(distMatrix,method="ward.D")

clustering <- cutree(groups,3)

plot(groups, hang = -100, cex = 1, labels=FALSE)
rect.hclust(groups, k=3)

在此处输入图像描述

Option 2:选项 2:

    m  <- as.matrix(dtm)
    
    distMatrix <- dist(m, method="euclidean")
    
    groups <- hclust(distMatrix,method="ward.D")
    
fviz_dend(groups, cex = 0.8, lwd = 0.8, k = 3, 
          rect = TRUE, 
          k_colors = "jco", 
          rect_border = "jco", 
          rect_fill = TRUE,
          ggtheme = theme_gray(),labels=F)

在此处输入图像描述

How can I represent the dendrogram without so much overlapping data appearing at the bottom?如何在没有太多重叠数据出现在底部的情况下表示树状图? It looks very confusing with so much data together.这么多数据在一起看起来很混乱。

Two things might help: make the y -axis log-scale, and reduce line thickness.有两件事可能会有所帮助:使y轴对数刻度,并减少线条粗细。

The former is easy, but changing the line thickness of an existing ggplot object is fiddly.前者很简单,但改变现有ggplot对象的线条粗细很麻烦。

Below is an example of what I have done in my recent analysis.下面是我在最近的分析中所做的一个例子。 I didn't use the fviz_dend function, instead I used as.dendrogram followed by ggplot() .我没有使用fviz_dend函数,而是使用as.dendrogram后跟ggplot()

If you want to work with your existing fviz plot, you could change the line thickness using the same method.如果您想使用现有的fviz图,可以使用相同的方法更改线条粗细。

Also with a large number of leaves, you might as well hide the labels (ie expand=c(0,0) in scale_y )也有大量的叶子,你不妨隐藏标签(即在scale_yexpand=c(0,0)


Calculate the hierarchical clustering:计算层次聚类:

require(RColorBrewer)
require(stats)
require(dendextend)
n = 4
hdata <- hclust(dist(data, "minkowski", p=2), method="ward.D")
clusters = cutree(hdata, k = n)
# vector of up to 16 different colours
col_vector = c(brewer.pal(n=10,"Paired"), brewer.pal(n=6,"Set2")) 

Plot before:之前的情节:

hdata %>%
  as.dendrogram %>%
  color_branches(k = n, col = col_vector) %>%
  ggplot() + theme_classic() + theme.text +
  theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
        axis.title.y=element_blank(),axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  scale_y_continuous(expand=c(0,0)) +
  scale_x_continuous(expand=c(0.001,0.001)) +
  labs(y="")

在此处输入图像描述

Plot after:之后绘制:

b = hdata %>%
  as.dendrogram %>%
  color_branches(k = n, col = col_vector) %>%
  ggplot() + theme_classic() + theme.text +
  theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
        axis.title.y=element_blank(),axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  scale_y_log10() +
  scale_x_continuous(expand=c(0.001,0.001)) +
  labs(y="")
# Adjust the line thickness
b = ggplot_build(b)
b$data[[1]]$size = 0.2
b = ggplot_gtable(b)
plot(b)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM