I'm clustering my data using k-means, but I'm not using standard algorithm, I'm using an approximated nearest neighbours (ANN) algorithm to accelerate the sample-to-center comparisons. This can be done easily with the following:
[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));
Now, when I run this code the variable ' trainDescriptors ' are clustered and each descriptor is assigned to the ' clusterCenters ' using ANN.
I have also another variable, ' testDescriptors '. I want to assign those to the cluster centres either. And this assignment must be done using the same approach with ' trainDescriptors ', but AFAIK vl_kmeans function does not return the tree that it build for fast assignment.
So, my question is, is it possible to assign ' testDescriptors ' to ' clustersCenters ' as ' trainDescriptors ' assigned to ' clusterCenters ' in the vl_kmeans function, if yes how can I do that?
Well, I've figured it out. It can be done like the following:
clusterCount = 1024;
datasetTrain = single(rand(128, 100000));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
'Algorithm', 'ANN', ...
'Distance', 'l2', ...
'NumRepetitions', 1, ...
'NumTrees', 3, ...
'MaxNumComparisons', ceil(clusterCount / 50) ...
);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
forest = vl_kdtreebuild(clusterCenters, ...
'Distance', 'l2', ...
'NumTrees', 3 ...
);
trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
validation = isequal(trainAssignments_actual, trainAssignments_expected);
In step 2 I'm creating a new tree using cluster centres and then assigning data to centers again. It gives a valid result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.