[英]Optimize parameters of a pairwise distance function in Matlab
This question is related to matlab: find the index of common values at the same entry from two arrays . 这个问题与matlab有关:从两个数组的同一条目中找到公共值的索引 。
Suppose that I have an 1000
by 10000
matrix that contains value 0
, 1
,and 2
. 假设我有一个1000
由10000
矩阵包含值0
, 1
,和2
。 Each row are treated as a sample. 每行都视为一个样本。 I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d)
where a
, b
, c
, d
can treated as as the row vector of length 10000
according to some definition and p=10000
. 我想根据公式d = 1-1/(2p)sum(a/c+b/d)
计算这些样本之间的成对距离,其中a
, b
, c
, d
可以视为长度的行向量10000
根据一些定义和p=10000
。 c
and d
are probabilities such that c+d=1
. c
和d
是使得c+d=1
概率。
An example of how to find the values of a
, b
, c
, d
: suppose we want to find d
between sample i
and b j
, then I look at row i
and j
. 如何找到a
, b
, c
, d
的值的示例:假设我们要在样本i
和b j
之间找到d
,然后看第i
和j
行。
If k
th entry of row i
and j
has value 2
and 2
, then a=2,b=0,c=1,d=0
(I guess I will assign 0/0=0
in this case). 如果行i
和j
第k
个条目具有值2
和2
,则a=2,b=0,c=1,d=0
(我想在这种情况下我将分配0/0=0
)。
If k
th entry of row i
and j
has value 2
and 1
or vice versa, then a=1,b=0,c=3/4,d=1/4
. 如果第i
行和第j
行的第k
个条目的值为2
和1
,反之亦然,则a=1,b=0,c=3/4,d=1/4
。
The similar assignment will give to the case for 2,0
( a=0,b=0,c=1/2,d=1/2
), 1,1
( a=1,b=1,c=1/2,d=1/2
), 1,0
( a=0,b=1,c=1/4,d=3/4
), 0,0
( a=0,b=2,c=0,d=1
). 类似的分配将适用于2,0
( a=0,b=0,c=1/2,d=1/2
), 1,1
( a=1,b=1,c=1/2,d=1/2
), 1,0
( a=0,b=1,c=1/4,d=3/4
), 0,0
( a=0,b=2,c=0,d=1
)。
The matlab code I have so far is using for
loops for i
and j
, then find the cases above by using find
, then create two arrays for a/c
and b/d
. 到目前为止,我拥有的Matlab代码用于i
和j
for
循环,然后使用find
查找上述情况,然后为a/c
和b/d
创建两个数组。 This is extremely slow, is there a way that I can improve the efficiency? 这非常慢,有没有办法可以提高效率?
Edit: the distance d
is the formula given in this paper on page 13. 编辑:距离d
是本文第13页给出的公式。
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. 如果这些系数是固定的,那么我想我已经成功地向量化了距离函数。 Figuring out the formulae was fun. 弄清楚公式很有趣。 I flipped things around a bit to minimise division, and since I wasn't aware of pdist
until @horchler's comment, you get it wrapped in loops with the constants factored out: 我稍微翻转了一些东西以最大程度地减少除法,并且由于直到@horchler的注释我才知道pdist
,因此将其包裹在循环中,将常量排除在外:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.