简体   繁体   English

包含字符串的成对距离矩阵

[英]Pairwise distance matrix containing strings

I need to calculate the pairwise distance between two matrix elements in a way that distance is equal to the number of binary differences between features/dimensions. 我需要计算两个矩阵元素之间的成对距离,该距离等于要素/维度之间的二进制差数。 I want to do this with MATLAB codes without using a loop. 我想使用MATLAB代码而不使用循环来执行此操作。 For example: Assume I want to calculate the distance between instances in A and B : 例如:假设我要计算AB实例之间的距离:

A = [ 1 2 3 ; 2 3 4]         % (two instances with three features)

B = [ 2 3 4 ; 2 5 6 ; 4 5 6] % (three instances with three features)

I need to calculate C , which would be a 2x3 matrix contain the distance of instances in A and B in a way that the distance between [1 3 3] and [2 3 4] would be 2: comparing the features, when a feature is equivalent, add 0 to distance and when they are dissimilar add 1 to distance. 我需要计算C ,这将是一个2x3矩阵,其中包含AB中实例的距离,使得[1 3 3][2 3 4]之间的距离为2:等价,则将距离加0,当它们不相似时,将距离加1。 So in this case, 所以在这种情况下

C = [3 3 3; 0 2 3].

A and B may contain strings instead of numbers. AB可能包含字符串而不是数字。

You can use bsxfun with @ne (not equal), followed by a sum to count the number of dissimilar features for an instance: 您可以将bsxfun@ne (不等于)一起使用,后跟一个sum以计算实例的不同功能的数量:

A = [1 2 3; 2 3 4];
B = [2 3 4; 2 5 6; 4 5 6];
C = squeeze(sum(bsxfun(@ne,A,permute(B,[3 2 1])),2))

C =

     3     3     3
     0     2     3

The above works by generating a logical array testing for equality of each feature for each instance pair via bsxfun(@ne,...) . 上面的工作是通过bsxfun(@ne,...)为每个实例对生成一个逻辑数组测试每个功能是否相等。 Then a sum is performed over dimension 2 to count the number of dissimilar features for each instance. 然后,对维度2进行sum以计算每个实例的不同特征的数量。

The function pdist2 with Hamming distance already does this for you: 具有汉明距离的函数pdist2已经为您完成此操作:

pdist2(A,B,'hamming')

This gives the result as percentage of coordinates that differ. 这给出了结果,以不同坐标的百分比表示。 Since you want number instead of percentage, multiply by the number of columns: 由于需要数字而不是百分比,因此请乘以列数:

pdist2(A,B,'hamming')*size(A,2)

ans =

     3     3     3
     0     2     3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM