简体   繁体   English

r中dist函数的区别

[英]The difference between dist functions in r

I want to calculate the dissimilarity indices on a binary matrix and have found several functions in R, but I can't get them to agree.我想计算二进制矩阵上的相异指数,并在 R 中找到了几个函数,但我无法让他们同意。 I use the jaccard coefficient as an example in the four functions: vegdist() , sim() , designdist() , and dist() .我在四个函数中使用了 Jaccard 系数作为示例: vegdist()sim()designdist()dist() I'm going to use the result for a cluster analysis.我将使用结果进行聚类分析。

library(vegan)
library(simba)

#Create random binary matrix
function1 <- function(m, n) {
  matrix(sample(0:1, m * n, replace = TRUE), m, n)
}
test <- function1(30, 20)

#Calculate dissimilarity indices with jaccard coefficient
dist1 <- vegdist(test, method = "jaccard")
dist2 <- sim(test, method = "jaccard")
dist3 <- designdist(test, method = "a/(a+b+c)", abcd = TRUE)
dist4 <- dist(test, method = "binary")

Does anyone know why dist1 and dist4 are different from dist2 and dist3 ?有谁知道为什么dist1dist4dist2dist3不同?

I put this as an answer as well.我也把这个作为答案。 Here the main comments for the dissimilarities you calculated:这里是您计算的差异的主要评论:

  • dist1 : you must set binary=TRUE in vegan::vegdist() (this is documented). dist1 :您必须在vegan::vegdist()中设置binary=TRUE (已记录)。

  • dist2 : simba::sim() calculates Jaccard similarity and you must use 1-dist2 . dist2 : simba::sim()计算 Jaccard 相似度,您必须使用1-dist2 The ?sim documentation gives a wrong formula for Jaccard similarity, but uses the correct formula in code. ?sim文档给出了 Jaccard 相似度的错误公式,但在代码中使用了正确的公式。 However, the documented formula defines a similarity.但是,记录的公式定义了相似性。

  • dist3 : Your vegan::designdist() formula gives Jaccard similarity and you should change it to dissimilarity. dist3 :您的vegan::designdist()公式给出了 Jaccard 相似性,您应该将其更改为不相似性。 There are many ways of doing this, and the code below gives one.有很多方法可以做到这一点,下面的代码给出了一种。

  • dist4 : this is correctly done. dist4 :这是正确完成的。

Replacing your four last lines with these will do the trick and give numerically identical results with all functions:用这些替换最后四行将起到作用,并为所有函数提供数字相同的结果:

#Calculate dissimilarity indices with jaccard coefficient
dist1 <- vegdist(test, method = "jaccard", binary = TRUE)
dist2 <- 1 - sim(test, method = "jaccard")
dist3 <- designdist(test, method = "(b+c)/(a+b+c)", abcd = TRUE)
dist4 <- dist(test, method = "binary")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM