成對相似度比較matlab

Question

我有一個矩陣A，其中包含事件及其發生的相關概率。 例如

A= [1, 0.6; 5, 0.3; 4, 0.1]

事件1發生的概率為60％，事件5發生的概率為30％，事件4發生的概率為1％。

然后我有一系列相似的矩陣（事件概率）

B = [1,0.5; 3,0.4; 2,0.1]
C = [2,0.9; 4,0.1; 3,0]
D = [1,0.6; 5,0.3; 4,0.1]

我想找到一個向量，顯示A與其他矩陣的相似性。

SIM = [?,?,1]

前兩個元素包含A和B之間以及A和C之間的相似性。第三個元素顯示A和D之間的相似性（因為它們相同，所以為1）。

您是否對如何實現函數進行矩陣之間的成對比較有任何建議？

非常感謝！！！

請考慮A等於A = [3,1;5,0;2,0] （等於A=[3,1;2,0;1,0]等）的情況。）

Answer 1

`A`和`B`之間`A`相似度計算功能

function SIM = SIMcalc(A,B)

%// Get joint unique events for A and B
unq_events = unique([A(:,1);B(:,1)]).'; %//'

%// Presence of events across joint unique events
event_tagA = bsxfun(@eq,A(:,1),unq_events);
event_tagB = bsxfun(@eq,B(:,1),unq_events);

%// Probabilities corresponding to each joint event
tagged_probA = sum(bsxfun(@times,A(:,2),event_tagA));
tagged_probB = sum(bsxfun(@times,B(:,2),event_tagB));

%// Set not-shared events as NaN
tagged_probA(~any(event_tagA))=nan;
tagged_probB(~any(event_tagB))=nan;

%// Get the similarity factors for each shared event. This is based on the
%// assumption that probabilities far apart must have a low shared
%// similarity factor. This factor would be later on used to scale the
%// individual probabilties for A and B.
sim_factor = 1-abs(tagged_probA-tagged_probB);
tagged_probA_sim_scaled = tagged_probA.*sim_factor;
tagged_probB_sim_scaled = tagged_probB.*sim_factor;

%// Get a concatenated matrix of scaled probabilities
tagged_probAB_sim_scaled = [tagged_probA_sim_scaled;tagged_probB_sim_scaled];

%// Get a hybrid array of probabilities based on the mean of probabilities
%// across A and B. Notice that for cases with identical probabilities, the
%// hybrid values would stay the same.
hybrid_probAB = mean(tagged_probAB_sim_scaled);

%// Get the sum of hybrid values. Notice that the sum would result in a
%// value of 1 when we have identical probabilities for identical events
SIM = nansum(hybrid_probAB);

return;

輸入樣本以測試相似度計算

%// Case 1 - First exammple from the question with D replacing B.
%// The SIM value must be 1 as mentioned in the question
disp('------------- Case 1 -----------------')
A= [1, 0.6; 5, 0.3; 4, 0.1]
B = [1,0.6; 5,0.3; 4,0.1]
SIM = SIMcalc(A,B)

%// Case 2 - Slight change to the first example with event 5 being
%// replaced by event 2 in B
%// The SIM value must be lesser than 1 as mentioned in the question
disp('------------- Case 2 -----------------')
A= [1, 0.6; 5, 0.3; 4, 0.1]
B = [1,0.6; 2,0.3; 4,0.1]
SIM = SIMcalc(A,B)

%// Case 3 - As presented in the comments by OP, that the SIM value must be 0
disp('------------- Case 3 -----------------')
A =[3,1;2,0;1,0]
B =[2,1;1,0;4,0]
SIM = SIMcalc(A,B)

%// Case 4 - As asked by me and replied by OP that SIM must be 1
disp('------------- Case 4 -----------------')
A =[3,1;2,0;1,0]
B =[3,1;2,0;1,0]
SIM = SIMcalc(A,B)

%// Case 5 - Random case added on my own.
%// As can be seen event 3 is common between A and B. Apart from event3,
%// only event 2 is common, but the probabilities arew far apart, so the
%// net SIM value must be slightly more than the identical probability of
%// event 3, i.e. slightly more than 0.55
 disp('------------- Case 5 -----------------')
A =[3,0.55;2,0.95;1,0]
B =[3,0.55;2,0.05;4,0.4]
SIM = SIMcalc(A,B)

結果

------------- Case 1 -----------------
A =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
B =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
SIM =
     1
------------- Case 2 -----------------
A =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
B =
    1.0000    0.6000
    2.0000    0.3000
    4.0000    0.1000
SIM =
    0.7000
------------- Case 3 -----------------
A =
     3     1
     2     0
     1     0
B =
     2     1
     1     0
     4     0
SIM =
     0
------------- Case 4 -----------------
A =
     3     1
     2     0
     1     0
B =
     3     1
     2     0
     1     0
SIM =
     1
------------- Case 5 -----------------
A =
    3.0000    0.5500
    2.0000    0.9500
    1.0000         0
B =
    3.0000    0.5500
    2.0000    0.0500
    4.0000    0.4000
SIM =
    0.6000

說明

讓我們以case 5來詳細解釋決定測量A和B之間相似度的最終標量值的基本原理。 建議針對這種情況運行代碼並觀察變量的值。

輸入

A =
    3.0000    0.5500
    2.0000    0.9500
    1.0000         0
B =
    3.0000    0.5500
    2.0000    0.0500
    4.0000    0.4000

第1步

標記與事件對應的A和B的概率，以便將不常見的事件作為NaNs放置。 因此，我們將擁有tagged_probA和tagged_probB ，它們的值如下所示-

Event 1  Event 2  Event 3  Event 4
   0      0.95     0.55     NaN
  NaN     0.05     0.55     0.4

第2步

計算概率之間的差，然后從1減去結果。 因此，接近1的數字表示相似度。 例如，在此示例中， event 3的結果為1 。 這形成了找到A和B之間的相似性標准的基礎，因為對於相同的概率，我們得到1 ；而由於概率在[0 1]的范圍上相距甚遠，因此得到的值較小。 這存儲到sim_factor –

sim_factor =
       NaN    0.1000    1.0000       NaN

第三步

使用sim_factor縮放A和B的標記概率。 因此，我們具有根據A和B之間的相似性縮放的標記概率。 這些是 -

tagged_probA_sim_scaled =
       NaN    0.0950    0.5500       NaN
tagged_probB_sim_scaled =
       NaN    0.0050    0.5500       NaN

第四步

由於最終值應該只是一個標量值，因此我們可以獲得標記和縮放后的概率的平均值。 對於相同概率情況，結果值將具有與各個概率相同的值，如本示例中的event 3 。 對於不相同的情況，它將根據A和B概率之間的差異來按比例縮小概率。 這是hybrid_probAB ，如下所示-

hybrid_probAB =
       NaN    0.0500    0.5500       NaN

第5步

求和來自hybrid_probAB的非NaN元素，得出最終的標量相似度值，對於這種特定情況，該值小於1 。 對於具有相同概率的案例，這將為我們提供完美的1 。

結束語

查看SIM值，它們確實遵循預期的趨勢。 因此，希望它能在您的其他情況下解決。 要計算A與其他數組之間A相似度值，請以它們作為輸入運行函數。

Answer 2

好的，因此您選擇一個接受兩個矩陣並輸出標量的函數，這樣您就可以使用bsxfun ：

similarity = @(x,y)(mean(mean(x./y)));
M = cat(3,B,C,D); %//Combine into a single 3D matrix
squeeze(sum(mean(bsxfun(similarity, A, M),2)))

請注意，我使用的相似性函數可能並不是最適合您的數據，因為如果第二個矩陣中包含0並且不對稱，則它返回Inf 。 關鍵是要說明它必須采用2D矩陣並輸出標量。

Answer 3

您是否考慮過形成這5個事件的完整概率直方圖？ 比方說：

Ah=[0.6 0 0 0.1 0.3];
Bh=[0.5 0.1 0.4 0 0];
Ch=[0 0.9 0 0.1 0];
Dh=[0.6 0 0 0.1 0.3];

然后，您可以將它們作為向量進行比較，將它們連接成矩陣並使用pdist：

m=[Ah; Bh; Ch; Dh];
sim=squareform(pdist(m,'cityblock'));

成對相似度比較matlab

問題描述

3 個解決方案

解決方案1
1 已采納 2014-07-18 13:33:48

`A`和`B`之間`A`相似度計算功能

輸入樣本以測試相似度計算

結果

說明

結束語

解決方案2
0 2014-07-18 12:30:28

解決方案3
0 2014-07-18 13:06:13

成對相似度比較matlab

問題描述

3 個解決方案

解決方案1 1 已采納 2014-07-18 13:33:48

A和B之間A相似度計算功能

輸入樣本以測試相似度計算

結果

說明

結束語

解決方案2 0 2014-07-18 12:30:28

解決方案3 0 2014-07-18 13:06:13

解決方案1
1 已采納 2014-07-18 13:33:48

`A`和`B`之間`A`相似度計算功能

解決方案2
0 2014-07-18 12:30:28

解決方案3
0 2014-07-18 13:06:13