简体   繁体   中英

Reading an upper triangular distance matrix and generating a dendrogram in R

My problem (today) is as follows:

I have an upper diagonal distance matrix in a file ("dist.dis") in text format (generated by a third-part program), which I want to read into R to compute a cluster analysis and generate a dendrogram:

   0.36364   0.36364   0.27273   0.81818   0.54545   0.63636   0.36364   0.45455
   0.18182   0.63636   0.63636   0.36364   0.63636   0.54545   0.09091
   0.45455   0.63636   0.18182   0.63636   0.54545   0.27273
   0.81818   0.63636   0.81818   0.27273   0.72727
   0.45455   0.18182   0.63636   0.54545
   0.45455   0.54545   0.27273
   0.81818   0.54545
   0.45455

In a separate text file ("dist.nam"), I also have a list of names of the objects among which the distances have been computed:

COOKO-A
COOKO-B
COOKO-C
COOKO-D
COOKO-E
COOKO-F
COOKO-G
COOKO-H
COOKO-I

Here is my R code to read the above matrix and generate a dendrogram:

mat <- matrix(0, 9, 9)
mat[row(mat) >= col(mat)] <- scan("dist.dis")
hc <- hclust(as.dist(mat), method="average")
ppi <- 100
png("clus.png", width=6*ppi, height=6*ppi, res=ppi)
plot(as.dendrogram(hc), xlab="Distance", ylab="", main="UPGMA dendrogram", horiz=TRUE, edgePar=list(col="blue", lwd=3))
dev.off()

This code works, and generates the dendrogram below:

在此处输入图像描述

However, I want to have the names of the objects (instead of their numbers) at the tips of the dendrogram. To achieve this, I tried the code below:

names <- scan("dist.nam", what="character")
df.dist <- as.dist(mat)
df.dist <- as.matrix(df.dist, labels=TRUE)
colnames(df.dist) <- names
rownames(df.dist) <- names
hc <- hclust(as.dist(mat), method="average")

But then I got a dreadful error: "Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536"): missing value where TRUE/FALSE needed".

Could someone give me a hand?

My suspicion is that this is related to using hclust with a matrix and not a dist object.

I would change the names of the matrix mat and then use as.dist (note you really only need to set colnames not both rows and columns). Let me know if this works for you.

mat <- matrix(0, 9, 9)
mat[row(mat) >= col(mat)] <- scan("dist.dis")

names <- scan("dist.nam", what="character")

colnames(mat) <- names

df.dist <- as.dist(mat)

hc <- hclust(df.dist, method="average")
ppi <- 100
png("clus.png", width=6*ppi, height=6*ppi, res=ppi)
par(mar=c(4,4,4,4))
plot(as.dendrogram(hc), xlab="Distance", ylab="", main="UPGMA dendrogram", horiz=TRUE, edgePar=list(col="blue", lwd=3))
dev.off()

带标签的树状图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM