[英]The difference between dist functions in r
I want to calculate the dissimilarity indices on a binary matrix and have found several functions in R, but I can't get them to agree.我想计算二进制矩阵上的相异指数,并在 R 中找到了几个函数,但我无法让他们同意。 I use the jaccard coefficient as an example in the four functions: vegdist()
, sim()
, designdist()
, and dist()
.我在四个函数中使用了 Jaccard 系数作为示例: vegdist()
、 sim()
、 designdist()
和dist()
。 I'm going to use the result for a cluster analysis.我将使用结果进行聚类分析。
library(vegan)
library(simba)
#Create random binary matrix
function1 <- function(m, n) {
matrix(sample(0:1, m * n, replace = TRUE), m, n)
}
test <- function1(30, 20)
#Calculate dissimilarity indices with jaccard coefficient
dist1 <- vegdist(test, method = "jaccard")
dist2 <- sim(test, method = "jaccard")
dist3 <- designdist(test, method = "a/(a+b+c)", abcd = TRUE)
dist4 <- dist(test, method = "binary")
Does anyone know why dist1
and dist4
are different from dist2
and dist3
?有谁知道为什么dist1
和dist4
与dist2
和dist3
不同?
I put this as an answer as well.我也把这个作为答案。 Here the main comments for the dissimilarities you calculated:这里是您计算的差异的主要评论:
dist1
: you must set binary=TRUE
in vegan::vegdist()
(this is documented). dist1
:您必须在vegan::vegdist()
中设置binary=TRUE
(已记录)。
dist2
: simba::sim()
calculates Jaccard similarity and you must use 1-dist2
. dist2
: simba::sim()
计算 Jaccard 相似度,您必须使用1-dist2
。 The ?sim
documentation gives a wrong formula for Jaccard similarity, but uses the correct formula in code. ?sim
文档给出了 Jaccard 相似度的错误公式,但在代码中使用了正确的公式。 However, the documented formula defines a similarity.但是,记录的公式定义了相似性。
dist3
: Your vegan::designdist()
formula gives Jaccard similarity and you should change it to dissimilarity. dist3
:您的vegan::designdist()
公式给出了 Jaccard 相似性,您应该将其更改为不相似性。 There are many ways of doing this, and the code below gives one.有很多方法可以做到这一点,下面的代码给出了一种。
dist4
: this is correctly done. dist4
:这是正确完成的。
Replacing your four last lines with these will do the trick and give numerically identical results with all functions:用这些替换最后四行将起到作用,并为所有函数提供数字相同的结果:
#Calculate dissimilarity indices with jaccard coefficient
dist1 <- vegdist(test, method = "jaccard", binary = TRUE)
dist2 <- 1 - sim(test, method = "jaccard")
dist3 <- designdist(test, method = "(b+c)/(a+b+c)", abcd = TRUE)
dist4 <- dist(test, method = "binary")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.