简体   繁体   English

如何命令DIST(X,方法=“二进制”)计算出距离矩阵?

[英]How the command dist(x,method=“binary”) calculates the distance matrix?

I have a been trying to figure that out but without much success. 我一直在试图解决这个问题,但没有成功。 I am working with a table with binary data (0s and 1s). 我正在处理带有二进制数据(0和1)的表。 I managed to estimate a distance matrix from my data using the R function dist(x,method="binary") , but I am not quite sure how exactly this function estimates the distance matrix. 我设法使用R函数dist(x,method="binary")从我的数据中估计出距离矩阵,但是我不确定该函数如何精确地估计出距离矩阵。 Is it using the Jaccard coefficient J=(M11)/(M10+M01+M11)? 是否使用Jaccard系数J =(M11)/(M10 + M01 + M11)?

This is easily found in the help page ?dist : 这可以在帮助页面?dist轻松找到:

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix. 此函数计算并返回距离矩阵,该距离矩阵是通过使用指定的距离度量来计算数据矩阵的行之间的距离而得出的。

[...] [...]

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are 'on' and zero elements are 'off'. 二进制:(又名非对称二进制):向量被视为二进制位,因此非零元素为“ on”,零元素为“ off”。 The distance is the proportion of bits in which only one is on amongst those in which at least one is on. 距离是其中至少一个开启的位中只有一个开启的位的比例。

This is equivalent to the Jaccard distance as described in Wikipedia : 这等效于Wikipedia中所述Jaccard距离

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union. 雅卡德距离的另一种解释是对称差与联合的大小之比。

In your notation, it is 1 - J = (M01 + M10)/(M01 + M10 + M11). 在您的表示法,它是1 - J =(M01 + M10)/(M01 + M10 + M11)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM