简体   繁体   English

将pvclust R函数应用于预先计算的dist对象

[英]applying the pvclust R function to a precomputed dist object

I'm using R to perform an hierarchical clustering. 我正在使用R来执行分层聚类。 As a first approach I used hclust and performed the following steps: 作为第一种方法,我使用了hclust并执行了以下步骤:

  1. I imported the distance matrix 我导入了距离矩阵
  2. I used the as.dist function to transform it in a dist object 我使用as.dist函数在dist对象中转换它
  3. I run hclust on the dist object 我在dist对象上运行了hclust

Here's the R code: 这是R代码:

distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
hclust(d, "ward")

At this point I would like to do something similar with the function pvclust ; 此时我想用函数pvclust做类似的pvclust ; however, I cannot because it's not possible to pass a precomputed dist object. 但是,我不能,因为它不可能传递预先计算的dist对象。 How can I proceed considering that I'm using a distance not available among those provided by the dist function of R? 考虑到我使用R的dist函数提供的距离不可用,我该怎么办?

I've tested the suggestion of Vincent, you can do the following (my data set is a dissimilarity matrix): 我已经测试了Vincent的建议,你可以做以下(我的数据集是一个相异矩阵):

# Import you data
distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)

# Compute the eigenvalues
x <- cmdscale(d,1,eig=T)

# Plot the eigenvalues and choose the correct number of dimensions (eigenvalues close to 0)
plot(x$eig, 
   type="h", lwd=5, las=1, 
   xlab="Number of dimensions", 
   ylab="Eigenvalues")

# Recover the coordinates that give the same distance matrix with the correct number of dimensions    
x <- cmdscale(d,nb_dimensions)

# As mentioned by Stéphane, pvclust() clusters columns
pvclust(t(x))

If the dataset is not too large, you can embed your n points in a space of dimension n-1, with the same distance matrix. 如果数据集不是太大,则可以将n个点嵌入到维度为n-1的空间中,并使用相同的距离矩阵。

# Sample distance matrix
n <- 100
k <- 1000
d <- dist( matrix( rnorm(k*n), nc=k ), method="manhattan" )

# Recover some coordinates that give the same distance matrix
x <- cmdscale(d, n-1)
stopifnot( sum(abs(dist(x) - d)) < 1e-6 )

# You can then indifferently use x or d
r1 <- hclust(d)
r2 <- hclust(dist(x)) # identical to r1
library(pvclust)
r3 <- pvclust(x)

If the dataset is large, you may have to check how pvclust is implemented. 如果数据集很大,您可能必须检查pvclust的实现方式。

It's not clear to me whether you only have a distance matrix, or you computed it beforehand. 我不清楚你是否只有一个距离矩阵,或者你事先计算过它。 In the former case, as already suggested by @Vincent, it would not be too difficult to tweak the R code of pvclust itself (using fix() or whatever; I provided some hints on another question on CrossValidated ). 在前一种情况下,正如@Vincent所建议的那样,调整pvclust本身的R代码(使用fix()或其他任何东西都不会太难;我在CrossValidated的另一个问题上提供了一些提示)。 In the latter case, the authors of pvclust provide an example on how to use a custom distance function, although that means you will have to install their "unofficial version". 在后一种情况下, pvclust的作者提供了一个如何使用自定义距离函数的示例 ,尽管这意味着您必须安装他们的“非官方版本”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM