简体   繁体   English

将树状图导出为 R 中的表格

[英]Exporting dendrogram as table in R

I would like to export an hclust-dendrogram from R into a data.table in order to subsequently import it into another ("home-made") software.我想将 hclust-dendrogram 从 R 导出到 data.table,以便随后将其导入另一个(“自制”)软件。 str(unclass(fit)) provides a text overview for the dendrogram, but what I'm looking for is really a numeric table. str(unclass(fit))提供树状图的文本概述,但我正在寻找的实际上是一个数值表。 I've looked at the Bioconductor ctc package, but the output it's producing looks somewhat cryptical.我查看了 Bioconductor ctc package,但它生成的 output 看起来有些神秘。 I would like to have something similar to this table: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm Is there a way to get this out of an hclust object in R?我想要类似于此表的内容: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm有没有办法从 R 中的 hclust object 中获取它?

In case anyone is also interested in dendrogram export, here is my solution.如果有人也对树状图导出感兴趣,这是我的解决方案。 Most probably, it's not the best one as I started using R only recently, but at least it works.很可能,它不是最好的,因为我最近才开始使用 R,但至少它有效。 So suggestions on how to improve the code are welcome.因此,欢迎提出有关如何改进代码的建议。

So, if hr is my hclust object and df is my data, the first column of which contains a simple index starting from 0, and the row names are the names of the clustered items:因此,如果hr是我的 hclust object 并且df是我的数据,其中第一列包含一个从 0 开始的简单索引,并且行名称是集群项的名称:

# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
              dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
   leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)

hr.merge <- hr$merge
n <- max(df[,1])

# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
  if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
  else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))

# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
           dimnames=list(c(0:(length(node.id)-1)),
              c("node.id", "parent.id", "pruning.level",
              "height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2))  # Insert a leaf/node index

# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
  dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}

# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
           nrow=length(hr$labels), ncol=3,
           dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])

# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
      dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}

# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
      dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}

# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height

# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)

# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0

for (i in (n+2):nrow(dend)) {
   if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
        dend[i,3] <- dend[(i-1),3]-1}
      else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])

dend <- dend[order(-node.id),]

# Write results table.
write.table(dend, file="path", sep=";", row.names=F)

There is package that does exactly opposite of what you want - Labeltodendro ;-) package 与您想要的完全相反 - Labeltodendro ;-)

But seriously, can't you just manually extract the elements from hclust object (eg $merge , $height , $order ) and create custom table from the extracted elements?但是严重的是,您不能只从hclust object 中手动提取元素(例如$merge$height$order )并从提取的元素创建自定义表吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM