[英]Make dendrograms more readable in R
I am working with 1800 observations to classify them.我正在使用 1800 个观察结果对它们进行分类。 I apply a dendrogram analysis in which I represent the data.
我应用树状图分析来表示数据。 I identify three groups.
我确定了三组。 The problem comes when it comes to visualizing the data.
当涉及到数据可视化时,问题就来了。 They are not readable.
它们不可读。 At the bottom, there is a lot of overlapping data.
在底部,有很多重叠的数据。 The labels are numbers, but I don't know how I managed to make them more readable.
标签是数字,但我不知道如何使它们更具可读性。 I have tried two options and neither is fruitful.
我尝试了两种选择,但都没有结果。
Option 1:选项1:
m <- as.matrix(dtm)
distMatrix <- dist(m, method="euclidean")
groups <- hclust(distMatrix,method="ward.D")
clustering <- cutree(groups,3)
plot(groups, hang = -100, cex = 1, labels=FALSE)
rect.hclust(groups, k=3)
Option 2:选项 2:
m <- as.matrix(dtm)
distMatrix <- dist(m, method="euclidean")
groups <- hclust(distMatrix,method="ward.D")
fviz_dend(groups, cex = 0.8, lwd = 0.8, k = 3,
rect = TRUE,
k_colors = "jco",
rect_border = "jco",
rect_fill = TRUE,
ggtheme = theme_gray(),labels=F)
How can I represent the dendrogram without so much overlapping data appearing at the bottom?如何在没有太多重叠数据出现在底部的情况下表示树状图? It looks very confusing with so much data together.
这么多数据在一起看起来很混乱。
Two things might help: make the y
-axis log-scale, and reduce line thickness.有两件事可能会有所帮助:使
y
轴对数刻度,并减少线条粗细。
The former is easy, but changing the line thickness of an existing ggplot
object is fiddly.前者很简单,但改变现有
ggplot
对象的线条粗细很麻烦。
Below is an example of what I have done in my recent analysis.下面是我在最近的分析中所做的一个例子。 I didn't use the
fviz_dend
function, instead I used as.dendrogram
followed by ggplot()
.我没有使用
fviz_dend
函数,而是使用as.dendrogram
后跟ggplot()
。
If you want to work with your existing fviz
plot, you could change the line thickness using the same method.如果您想使用现有的
fviz
图,可以使用相同的方法更改线条粗细。
Also with a large number of leaves, you might as well hide the labels (ie expand=c(0,0)
in scale_y
)也有大量的叶子,你不妨隐藏标签(即在
scale_y
中expand=c(0,0)
)
Calculate the hierarchical clustering:计算层次聚类:
require(RColorBrewer)
require(stats)
require(dendextend)
n = 4
hdata <- hclust(dist(data, "minkowski", p=2), method="ward.D")
clusters = cutree(hdata, k = n)
# vector of up to 16 different colours
col_vector = c(brewer.pal(n=10,"Paired"), brewer.pal(n=6,"Set2"))
Plot before:之前的情节:
hdata %>%
as.dendrogram %>%
color_branches(k = n, col = col_vector) %>%
ggplot() + theme_classic() + theme.text +
theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
axis.title.y=element_blank(),axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_continuous(expand=c(0,0)) +
scale_x_continuous(expand=c(0.001,0.001)) +
labs(y="")
Plot after:之后绘制:
b = hdata %>%
as.dendrogram %>%
color_branches(k = n, col = col_vector) %>%
ggplot() + theme_classic() + theme.text +
theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
axis.title.y=element_blank(),axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_log10() +
scale_x_continuous(expand=c(0.001,0.001)) +
labs(y="")
# Adjust the line thickness
b = ggplot_build(b)
b$data[[1]]$size = 0.2
b = ggplot_gtable(b)
plot(b)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.