對majorclust算法感到困惑

Question

我想在matlab中為“ majorclust”算法編寫自己的代碼。 我的文檔對具有余弦相似度。 當我在網上搜索時，我遇到了這個網站。

http://muse-amuse.in/~baali/MajorClustPost.html

在此網站的示例（用Python編寫）中，聚類部分如下所示：

t = False
indices = np.arange(num_of_samples)
while not t:
  t = True
  for index in np.arange(num_of_samples):
    # aggregating edge weights 
    new_index = np.argmax(np.bincount(indices, 
    weights=cosine_distances[index]))
if indices[new_index] != indices[index]:
  indices[index] = indices[new_index]
  t = False

當我檢查樣品時，我有點困惑。 當我們考慮for循環時：

for index in np.arange(num_of_samples):

第一個索引將為“ 0”。 並以“ 1”檢索最大相似度。 因此，new_index必須為1，索引“ 0”將替換為“ 1”。

在下一次迭代中，索引將為“ 1”，其最大權重將來自“ 0”，該值與上次迭代具有相同的索引。 結果，此點循環之后必須終止。

該算法基於論文（在第4頁上給出）：

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2002c.pdf

在紙上，指出必須隨機選擇索引。 但是在示例中，我看不到任何隨機選擇。

我想念什么？

Answer 1

是的，如果您改組索引會很好，您可以使用

from random import shuffle
shuffled_indices = np.arange(num_of_samples)
shuffle(shuffled_indices)
for index in shuffled_indices:
    # aggregating edge weights 
    new_index = np.argmax(np.bincount(indices,weights=cosine_distances[index]))
    if indices[new_index] != indices[index]:
        indices[index] = indices[new_index]
        t = False

很抱歉收到這么晚的回復。

對majorclust算法感到困惑

問題描述

1 個解決方案

解決方案1
2 2015-10-22 15:23:19

對majorclust算法感到困惑

問題描述

1 個解決方案

解決方案1 2 2015-10-22 15:23:19

解決方案1
2 2015-10-22 15:23:19