I am working with 1800 observations to classify them. I apply a dendrogram analysis in which I represent the data. I identify three groups. The problem comes when it comes to visualizing the data. They are not readable. At the bottom, there is a lot of overlapping data. The labels are numbers, but I don't know how I managed to make them more readable. I have tried two options and neither is fruitful.
Option 1:
m <- as.matrix(dtm)
distMatrix <- dist(m, method="euclidean")
groups <- hclust(distMatrix,method="ward.D")
clustering <- cutree(groups,3)
plot(groups, hang = -100, cex = 1, labels=FALSE)
rect.hclust(groups, k=3)
Option 2:
m <- as.matrix(dtm)
distMatrix <- dist(m, method="euclidean")
groups <- hclust(distMatrix,method="ward.D")
fviz_dend(groups, cex = 0.8, lwd = 0.8, k = 3,
rect = TRUE,
k_colors = "jco",
rect_border = "jco",
rect_fill = TRUE,
ggtheme = theme_gray(),labels=F)
How can I represent the dendrogram without so much overlapping data appearing at the bottom? It looks very confusing with so much data together.
Two things might help: make the y
-axis log-scale, and reduce line thickness.
The former is easy, but changing the line thickness of an existing ggplot
object is fiddly.
Below is an example of what I have done in my recent analysis. I didn't use the fviz_dend
function, instead I used as.dendrogram
followed by ggplot()
.
If you want to work with your existing fviz
plot, you could change the line thickness using the same method.
Also with a large number of leaves, you might as well hide the labels (ie expand=c(0,0)
in scale_y
)
Calculate the hierarchical clustering:
require(RColorBrewer)
require(stats)
require(dendextend)
n = 4
hdata <- hclust(dist(data, "minkowski", p=2), method="ward.D")
clusters = cutree(hdata, k = n)
# vector of up to 16 different colours
col_vector = c(brewer.pal(n=10,"Paired"), brewer.pal(n=6,"Set2"))
Plot before:
hdata %>%
as.dendrogram %>%
color_branches(k = n, col = col_vector) %>%
ggplot() + theme_classic() + theme.text +
theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
axis.title.y=element_blank(),axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_continuous(expand=c(0,0)) +
scale_x_continuous(expand=c(0.001,0.001)) +
labs(y="")
Plot after:
b = hdata %>%
as.dendrogram %>%
color_branches(k = n, col = col_vector) %>%
ggplot() + theme_classic() + theme.text +
theme(panel.grid.major.y = element_line(),axis.title=element_blank(),
axis.title.y=element_blank(),axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_log10() +
scale_x_continuous(expand=c(0.001,0.001)) +
labs(y="")
# Adjust the line thickness
b = ggplot_build(b)
b$data[[1]]$size = 0.2
b = ggplot_gtable(b)
plot(b)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.