ELKI: How to Specify Feature Columns of CSV for K-Means

Question

I am trying to run K-Means using ELKI MiniGUI. I have a CSV dataset of 15 features (columns) and a label column. I would like to do multiple runs of K-Means with different combinations of the feature columns.

Is there anywhere in the MiniGUI where I can specify the indeces of which columns I would like to be used for clustering?

If not, what is the simplest way to achieve this by changin/extending ELKI in Java?

Answer 1

This is obivously easily achievable with Java code, or simply by preprocessing the data as necessary. Generate 10 variants, then launch ELKI via the command line.

But there is a filter to select columns: NumberVectorFeatureSelectionFilter . To only use columns 0,1,2 (in the numeric part; labels are treated separately at this point; this is a vector transformation):

-dbc.filter transform.NumberVectorFeatureSelectionFilter
-projectionfilter.selectedattributes 0,1,2

The filter could be extended using our newer IntRangeParameter to allow for specifications such as 1..3,5..8; but this has not been implemented yet.

ELKI: How to Specify Feature Columns of CSV for K-Means

Question

1 answers

solution1
1 ACCPTED 2020-03-10 08:16:29

ELKI: How to Specify Feature Columns of CSV for K-Means

Question

1 answers

solution1 1 ACCPTED 2020-03-10 08:16:29

solution1
1 ACCPTED 2020-03-10 08:16:29