特征相似度的成对距离计算（多维矩阵）

Question

Ok here is the formula in matlab: 好的，这是matlab中的公式：

function D = dumDistance(X,Y)
n1 = size(X,2);
n2 = size(Y,2);
D = zeros(n1,n2);
for i = 1:n1
    for j = 1:n2
        D(i,j) = sum((X(:,i)-Y(:,j)).^2);
    end
end

Credits here (I know it's not a fast implementation but for the sake of the basic algorithm). 这里的功劳（我知道这不是一个快速的实现，而是出于基本算法的考虑）。

Now here is my understanding problem; 现在是我的理解问题；

Say that we have a matrix dictionary=140x100 words. 假设我们有一个矩阵dictionary=140x100单词。 And a matrix page=140x40 words. 矩阵page=140x40字。 Each column represents a word in the 140 dimensional space. 每列代表140维空间中的一个单词。

Now, if I use dumDistance(page,dictionairy) it will return a 40x100 matrix with the distances. 现在，如果我使用dumDistance(page,dictionairy) ，它将返回一个40x100的距离矩阵。

What I want to achieve, is to find how close is each word of page matrix to the dictionary matrix, in order to represent the page according to dictionary with a histogram let's say. 我要实现的是找到page矩阵的每个单词与dictionary矩阵有多近，以便用具有直方图的字典表示页面。

I know, that If I take the min(40x100), ill get a 1x100 matrix with locations of min values to represent my histogram. 我知道，如果我采用min（40x100），则将得到一个1x100矩阵，该矩阵的最小值表示我的直方图。

What I really cant understand here, is this 40x100 matrix. 我在这里真正无法理解的是这个40x100矩阵。 What data does this matrix represents anyway? 这个矩阵仍然代表什么数据？ I cant visualize this in my mind. 我在脑海中无法想象这一点。

Answer 1

Minor comment before I start: 在我开始之前的小评论：

You should really use pdist2 instead. 您应该真正使用pdist2代替。 This is much faster and you'll get the same results as dumDistance . 这要快得多，您将得到与dumDistance相同的结果。 In other words, you would call it like this: 换句话说，您可以这样称呼它：

D = pdist2(page.', dictionary.');

You need to transpose page and dictionary as pdist2 assumes that each row is an observation, while each column corresponds to a variable / feature. 您需要转置page和dictionary因为pdist2假设每一行都是一个观察值，而每一列都对应一个变量/功能。 Your data is structured such that each column is an observation. 您的数据结构使得每一列都是一个观察值。 This will return a 40 x 100 matrix like what you see in dumDistance . 这将返回一个40 x 100矩阵，就像您在dumDistance看到的dumDistance 。 However, pdist2 does not use for loops . 但是， pdist2 不for loops 。

Now onto your question: 现在到您的问题：

D(i,j) represents the Euclidean squared distance between word i from your page and word j from your dictionary. D(i,j)表示欧氏字之间的平方距离i从你的页面和文字j从你的字典。 You have 40 words on your page and 100 words in your dictionary. 您的页面上有40个单词，而字典中有100个单词。 Each word is represented by a 140 dimensional feature vector, and so the rows of D index the words of page while the columns of D index the words of dictionary . 每个单词都由140维特征向量表示，因此D的行索引page的单词，而D的列索引dictionary 。

What I mean here in terms of "distance" is in terms of the feature space. 我这里所说的“距离”是指特征空间。 Each word from your page and dictionary are represented as a 140 length vector. 页面和词典中的每个单词都表示为140个长度的向量。 Each entry (i,j) of D takes the i ^th vector from page and the j ^th vector from dictionary , each of their corresponding components subtracted, squared, and then they are summed up. D每个条目(i,j)都从page i ^个向量中提取page ，第j ^个向量则从dictionary ，它们各自的相应分量相减，平方后求和。 This output is then stored into D(i,j) . 然后将此输出存储到D(i,j) 。 This gives you the dissimilarity between word i from your page and word j from your dictionary at D(i,j) . 这给你的字间的差异性i从你的page和文字j从你dictionary在D(i,j) The higher the value, the more dissimilar the two words are. 值越高，两个单词越相似。

Minor Note: pdist2 computes the Euclidean distance while dumDistance computes the Euclidean squared distance. 次要说明： pdist2计算欧几里得距离，而dumDistance计算欧几里得平方距离。 If you want to have the same thing as dumDistance , simply square every element in D from pdist2 . 如果您想拥有与dumDistance相同的dumDistance ，只需将dumDistance中D每个元素平方pdist2 。 In other words, simply compute D.^2 . 换句话说，只需计算D.^2 。

Hope this helps. 希望这可以帮助。 Good luck! 祝好运！

特征相似度的成对距离计算（多维矩阵）

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-07-10 02:59:22

特征相似度的成对距离计算（多维矩阵）

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-07-10 02:59:22

解决方案1
1 已采纳 2014-07-10 02:59:22