简体   繁体   English

在Matlab中优化成对距离函数的参数

[英]Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays . 这个问题与matlab有关:从两个数组的同一条目中找到公共值的索引

Suppose that I have an 1000 by 10000 matrix that contains value 0 , 1 ,and 2 . 假设我有一个100010000矩阵包含值01 ,和2 Each row are treated as a sample. 每行都视为一个样本。 I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a , b , c , d can treated as as the row vector of length 10000 according to some definition and p=10000 . 我想根据公式d = 1-1/(2p)sum(a/c+b/d)计算这些样本之间的成对距离,其中abcd可以视为长度的行向量10000根据一些定义和p=10000 c and d are probabilities such that c+d=1 . cd是使得c+d=1概率。

An example of how to find the values of a , b , c , d : suppose we want to find d between sample i and b j , then I look at row i and j . 如何找到abcd的值的示例:假设我们要在样本i和b j之间找到d ,然后看第ij行。

If k th entry of row i and j has value 2 and 2 , then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case). 如果行ijk个条目具有值22 ,则a=2,b=0,c=1,d=0 (我想在这种情况下我将分配0/0=0 )。

If k th entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4 . 如果第i行和第j行的第k个条目的值为21 ,反之亦然,则a=1,b=0,c=3/4,d=1/4

The similar assignment will give to the case for 2,0 ( a=0,b=0,c=1/2,d=1/2 ), 1,1 ( a=1,b=1,c=1/2,d=1/2 ), 1,0 ( a=0,b=1,c=1/4,d=3/4 ), 0,0 ( a=0,b=2,c=0,d=1 ). 类似的分配将适用于2,0a=0,b=0,c=1/2,d=1/2 ), 1,1a=1,b=1,c=1/2,d=1/2 ), 1,0a=0,b=1,c=1/4,d=3/4 ), 0,0a=0,b=2,c=0,d=1 )。

The matlab code I have so far is using for loops for i and j , then find the cases above by using find , then create two arrays for a/c and b/d . 到目前为止,我拥有的Matlab代码用于ij for循环,然后使用find查找上述情况,然后为a/cb/d创建两个数组。 This is extremely slow, is there a way that I can improve the efficiency? 这非常慢,有没有办法可以提高效率?

Edit: the distance d is the formula given in this paper on page 13. 编辑:距离d本文第13页给出的公式。

Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. 如果这些系数是固定的,那么我我已经成功地向量化了距离函数。 Figuring out the formulae was fun. 弄清楚公式很有趣。 I flipped things around a bit to minimise division, and since I wasn't aware of pdist until @horchler's comment, you get it wrapped in loops with the constants factored out: 我稍微翻转了一些东西以最大程度地减少除法,并且由于直到@horchler的注释我才知道pdist ,因此将其包裹在循环中,将常量排除在外:

% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
    for jj=ii+1:n
        a = min(m(ii,:), m(jj,:));
        b = 2 - max(m(ii,:), m(jj,:));
        c = 4 ./ (m(ii,:) + m(jj,:));
        c(c == Inf) = 0;
        d = 1 - c;

        distance(ii,jj) = sum(a.*c + b.*d);
        % distance(jj,ii) = distance(ii,jj); % optional for the full matrix
    end
end
distance = 1 - (1 / (2 * p)) * distance;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM