简体   繁体   中英

How can I cluster data using a distance matrix with the ELKI library?

I have a distance matrix and I want to use that distance matrix when clustering my data.

I've read the ELKI documentation and it states that I can overwrite the distance method when extending the AbstractNumberVectorDistanceFunction class.

The distance class however, returns the coordinates. So from coordinate x to coordinate y. This is troublesome because the distance matrix is filled only with distance values and we use the indexes to find the distance value from index x to index y . Here's the code from the documentation:

public class TutorialDistanceFunction extends AbstractNumberVectorDistanceFunction {
  @Override
  public double distance(NumberVector o1, NumberVector o2) {
    double dx = o1.doubleValue(0) - o2.doubleValue(0);
    double dy = o1.doubleValue(1) - o2.doubleValue(1);
    return dx * dx + Math.abs(dy);
  }
}

My question is how to correctly use the distance matrix when clustering with ELKI.

AbstractNumberVectorDistanceFunction is the approriate parent class only if your input data are number vectors. If your data type is abstract object identifiers, subclass AbstractDBIDRangeDistanceFunction instead. You then have to implement

double distance(int i1, int i2);

There are already different implementations of a distance function for precomputed distances, for example DiskCacheBasedDoubleDistanceFunction that memory-maps a distance matrix stored on disk. We should add a DoubleMatrixDistanceFunction though, for direct use from Java (in the next version, all class names and package names will be shortened, btw).

See also: https://elki-project.github.io/howto/precomputed_distances in particular the section titled "Using without primary data", on how to set up a database with no primary data, when you only use a distance matrix.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM