簡體   English   中英

使用VLFeat創建集群后,將描述符分配給集群中心

[英]Assign descriptors to cluster centers after creating clusters using VLFeat

我正在使用k-means聚類我的數據,但我沒有使用標准算法,我使用近似最近鄰(ANN)算法來加速樣本到中心的比較。 這可以通過以下方式輕松完成:

[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));

現在,當我運行此代碼時,變量' trainDescriptors '被聚類,並且使用ANN將每個描述符分配給' clusterCenters '。

我還有另一個變量' testDescriptors '。 我想將它們分配給集群中心。 並且必須使用與“ trainDescriptors ”相同的方法完成此分配,但AFAIK vl_kmeans函數不會返回為快速分配而構建的樹。

所以,我的問題是,是否有可能分配“testDescriptors”“clustersCenters”作為vl_kmeans功能分配給“clusterCenters” trainDescriptors',如果是我該怎么辦呢?

好吧,我已經弄清楚了。 它可以像下面這樣做:

clusterCount = 1024;
datasetTrain = single(rand(128, 100000)); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
    'Algorithm', 'ANN', ...
    'Distance', 'l2', ...
    'NumRepetitions', 1, ...
    'NumTrees', 3, ...
    'MaxNumComparisons', ceil(clusterCount / 50) ...
);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

forest = vl_kdtreebuild(clusterCenters, ...
    'Distance', 'l2', ...
    'NumTrees', 3 ...
);

trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

validation = isequal(trainAssignments_actual, trainAssignments_expected);

在步驟2中,我使用群集中心創建新樹,然后再次將數據分配給中心。 它給出了有效的結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM