在Matlab中优化成对距离函数的参数

Question

This question is related to matlab: find the index of common values at the same entry from two arrays . 这个问题与matlab有关：从两个数组的同一条目中找到公共值的索引。

Suppose that I have an 1000 by 10000 matrix that contains value 0 , 1 ,and 2 . 假设我有一个1000由10000矩阵包含值0 ， 1 ，和2 。 Each row are treated as a sample. 每行都视为一个样本。 I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a , b , c , d can treated as as the row vector of length 10000 according to some definition and p=10000 . 我想根据公式d = 1-1/(2p)sum(a/c+b/d)计算这些样本之间的成对距离，其中a ， b ， c ， d可以视为长度的行向量10000根据一些定义和p=10000 。 c and d are probabilities such that c+d=1 . c和d是使得c+d=1概率。

An example of how to find the values of a , b , c , d : suppose we want to find d between sample i and b j , then I look at row i and j . 如何找到a ， b ， c ， d的值的示例：假设我们要在样本i和b j之间找到d ，然后看第i和j行。

If k th entry of row i and j has value 2 and 2 , then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case). 如果行i和j第k个条目具有值2和2 ，则a=2,b=0,c=1,d=0 （我想在这种情况下我将分配0/0=0 ）。

If k th entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4 . 如果第i行和第j行的第k个条目的值为2和1 ，反之亦然，则a=1,b=0,c=3/4,d=1/4 。

The similar assignment will give to the case for 2,0 ( a=0,b=0,c=1/2,d=1/2 ), 1,1 ( a=1,b=1,c=1/2,d=1/2 ), 1,0 ( a=0,b=1,c=1/4,d=3/4 ), 0,0 ( a=0,b=2,c=0,d=1 ). 类似的分配将适用于2,0 （ a=0,b=0,c=1/2,d=1/2 ）， 1,1 （ a=1,b=1,c=1/2,d=1/2 ）， 1,0 （ a=0,b=1,c=1/4,d=3/4 ）， 0,0 （ a=0,b=2,c=0,d=1 ）。

The matlab code I have so far is using for loops for i and j , then find the cases above by using find , then create two arrays for a/c and b/d . 到目前为止，我拥有的Matlab代码用于i和j for循环，然后使用find查找上述情况，然后为a/c和b/d创建两个数组。 This is extremely slow, is there a way that I can improve the efficiency? 这非常慢，有没有办法可以提高效率？

Edit: the distance d is the formula given in this paper on page 13. 编辑：距离d是本文第13页给出的公式。

Answer 1

Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. 如果这些系数是固定的，那么我想我已经成功地向量化了距离函数。 Figuring out the formulae was fun. 弄清楚公式很有趣。 I flipped things around a bit to minimise division, and since I wasn't aware of pdist until @horchler's comment, you get it wrapped in loops with the constants factored out: 我稍微翻转了一些东西以最大程度地减少除法，并且由于直到@horchler的注释我才知道pdist ，因此将其包裹在循环中，将常量排除在外：

% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
    for jj=ii+1:n
        a = min(m(ii,:), m(jj,:));
        b = 2 - max(m(ii,:), m(jj,:));
        c = 4 ./ (m(ii,:) + m(jj,:));
        c(c == Inf) = 0;
        d = 1 - c;

        distance(ii,jj) = sum(a.*c + b.*d);
        % distance(jj,ii) = distance(ii,jj); % optional for the full matrix
    end
end
distance = 1 - (1 / (2 * p)) * distance;

在Matlab中优化成对距离函数的参数

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-01-29 00:25:35

在Matlab中优化成对距离函数的参数

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-01-29 00:25:35

解决方案1
0 已采纳 2014-01-29 00:25:35