简体   繁体   中英

Create Dendrogram with Elki

I want to plot a dendrogram for a cluster result. Right now I am using ElkiBuilder from ELKI 0.7.5 for clustering.

In the best case I'd like to directly plot a dendrogram.

If that's not possible I'd like to extract information (distances) from the clustering to create a dendrogram with another library (eg. using newick format)

Therefore my questions:

  • Is it possible to create dendrograms with ELKI?

  • Is it possible to access the distances which have been calculated during the clustering? (the distances used when two clusters were merged)

Right now I am using the following code for clustering:

public Clustering<?> createClustering() {
    double[][] distanceMatrix = new double[][]{
            {0.0, 1.0, 3.0},
            {1.0, 0.0, 4.0},
            {3.0, 4.0, 0.0}
    };
    int noOfClusters = 2;
    // Adapter to load data from an existing array.
    DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(distanceMatrix);
    // Create a database (which may contain multiple relations!)
    Database db = new StaticArrayDatabase(dbc, null);
    // Load the data into the database (do NOT forget to initialize...)
    db.initialize();

    Clustering<?> clustering = new ELKIBuilder<>(CutDendrogramByNumberOfClusters.class) //
            .with(CutDendrogramByNumberOfClusters.Parameterizer.MINCLUSTERS_ID, noOfClusters) //
            .with(AbstractAlgorithm.ALGORITHM_ID, AnderbergHierarchicalClustering.class) //
            .with(AGNES.Parameterizer.LINKAGE_ID, WardLinkage.class)
            .build().run(db);
    return clustering;
}

The AGNES class (instead I recommend to use AnderbergHierarchicalClustering instead, it is much faster but gives the exact same result) returns the clustering in a standard form called "pointer hierarchy" ( PointerHierarchyRepresentationResult ). The merge of i and j at height h is represented as a pointer from i to j, with height h. Afterwards, j represents the merged cluster. This basic form was introduces by Sibson et al. with the SLINK algorithm in 1973.

In particular this contains the y information ( getParentDistanceStore ), the merges (given by getParentStore ), and it can compute an order to arrange the points for visualization getPositions .

You may want to have a look at the code of DendrogramVisualization , which is responsible for creating the SVG dendrogram in the GUI.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM