简体   繁体   中英

Assign descriptors to cluster centers after creating clusters using VLFeat

I'm clustering my data using k-means, but I'm not using standard algorithm, I'm using an approximated nearest neighbours (ANN) algorithm to accelerate the sample-to-center comparisons. This can be done easily with the following:

[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));

Now, when I run this code the variable ' trainDescriptors ' are clustered and each descriptor is assigned to the ' clusterCenters ' using ANN.

I have also another variable, ' testDescriptors '. I want to assign those to the cluster centres either. And this assignment must be done using the same approach with ' trainDescriptors ', but AFAIK vl_kmeans function does not return the tree that it build for fast assignment.

So, my question is, is it possible to assign ' testDescriptors ' to ' clustersCenters ' as ' trainDescriptors ' assigned to ' clusterCenters ' in the vl_kmeans function, if yes how can I do that?

Well, I've figured it out. It can be done like the following:

clusterCount = 1024;
datasetTrain = single(rand(128, 100000)); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
    'Algorithm', 'ANN', ...
    'Distance', 'l2', ...
    'NumRepetitions', 1, ...
    'NumTrees', 3, ...
    'MaxNumComparisons', ceil(clusterCount / 50) ...
);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

forest = vl_kdtreebuild(clusterCenters, ...
    'Distance', 'l2', ...
    'NumTrees', 3 ...
);

trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

validation = isequal(trainAssignments_actual, trainAssignments_expected);

In step 2 I'm creating a new tree using cluster centres and then assigning data to centers again. It gives a valid result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM