简体   繁体   中英

How to use existing data in ELKI

I keep stubbling upon ELKI these couple of days while searching for the most suitable density clustering tool and decided to try it. For DBSCAN, I've managed to reproduce successfully the test which clusters the file "3clusters-and-noise-2d.csv" and have also managed to print clusters metadata and points in each cluster all via ELKI code from github (latest version) IN java (I'm not really interested in cli or ui tool).

Now, I want to use some kind of internal java structure to create a database instead of importing via a file to reduce write and read overhead.

In the example provided I'm able to do this but for only the first column of the file.

My question basically is, how to create the same database which was created via a file, when the same data already exists in java?

Got it!

so after some tweaking, basically what you do is use 2d array of doubles where each row represents a point and you have as much columns as your dimensions... to create your database without reading a file, you basically use an ArrayAdapterDatabaseConnection as follows:

    double[][] data = new double[NUM_OF_POINTS][NUM_OF_DIMENSIONS]; 
    //populate data according to your app
    DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
    Database db = new StaticArrayDatabase(dbc, null);
    db.initialize();

    //dbscan algorithm setup
    params = new ListParameterization();
    params.addParameter(DBSCAN.Parameterizer.EPSILON_ID, 0.04);
    params.addParameter(DBSCAN.Parameterizer.MINPTS_ID, 20);
    DBSCAN<DoubleVector> dbscan = ClassGenericsUtil.parameterizeOrAbort(DBSCAN.class, params);

    //run DBSCAN on database
    Clustering<Model> result = dbscan.run(db);

I've tested this with the "3clusters-and-noise-2d.csv" dataset and can confirm i get same results when I pass them via file or arrayadapter.

A complete example can be found in the ELKI sources:

http://elki.dbs.ifi.lmu.de/browser/elki/elki/src/main/java/tutorial/javaapi/PassingDataToELKI.java

It generates random data and runs k-means on it. It also shows how to reliably map back DBIDs to your data points.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM