简体   繁体   中英

ELKI: How to Specify Feature Columns of CSV for K-Means

I am trying to run K-Means using ELKI MiniGUI. I have a CSV dataset of 15 features (columns) and a label column. I would like to do multiple runs of K-Means with different combinations of the feature columns.

Is there anywhere in the MiniGUI where I can specify the indeces of which columns I would like to be used for clustering?

If not, what is the simplest way to achieve this by changin/extending ELKI in Java?

This is obivously easily achievable with Java code, or simply by preprocessing the data as necessary. Generate 10 variants, then launch ELKI via the command line.

But there is a filter to select columns: NumberVectorFeatureSelectionFilter . To only use columns 0,1,2 (in the numeric part; labels are treated separately at this point; this is a vector transformation):

-dbc.filter transform.NumberVectorFeatureSelectionFilter
-projectionfilter.selectedattributes 0,1,2

The filter could be extended using our newer IntRangeParameter to allow for specifications such as 1..3,5..8; but this has not been implemented yet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM