简体   繁体   中英

How to run DatabaseUtil.precomputedKNNQuery method of LOF class on two different threads

I want to reduce runtime of DatabaseUtil.precomputedKNNQuery method by running this method on two different threads and KNNQuery is an interface.

    KNNQuery<O> knnq = DatabaseUtil.precomputedKNNQuery(database, relation, getDistanceFunction(), k);

I divided this method of LOF class in two parts like this

       Callable<KNNQuery> task1(Database database, Relation<O> relation){
        DBIDs idss = relation.getDBIDs();
        ArrayDBIDs aids = (ArrayDBIDs) idss;
        aids = aids.slice(0, (aids.size() / 2));
        aids.size();
        ProxyView<O> pv = new ProxyView<>(aids, relation);
        return () -> {
            return DatabaseUtil.precomputedKNNQuery(database, pv, 
        getDistanceFunction(), k);
        };
    }

    Callable<KNNQuery> task2(Database database, Relation<O> relation) {
        DBIDs idss = relation.getDBIDs();
        ArrayDBIDs aids = (ArrayDBIDs) idss;
        aids = aids.slice(((aids.size() / 2) - 1), aids.size());
        aids.size();
        ProxyView<O> pv2 = new ProxyView<>(aids, relation);
        return () -> {
            return DatabaseUtil.precomputedKNNQuery(database, pv2, getDistanceFunction(), k);
        };
    }

Then i invoked both these tasks on two different threads like this in run() method of LOF class

 public OutlierResult run(Database database, Relation<O> relation) {
StepProgress stepprog = LOG.isVerbose() ? new StepProgress("LOF", 3) : null;
DBIDs ids = relation.getDBIDs();

 LOG.beginStep(stepprog, 1, "Materializing nearest-neighbor sets.");     
 ExecutorService executor = Executors.newFixedThreadPool(2);
 List<Callable<KNNQuery>> callables = Arrays.asList(
            task1(database, relation),
            task2(database, relation));
  for (Future<KNNQuery> future : executor.invokeAll(callables)) {
       KNNQuery<O> knnq = future.get();
  // Compute LRDs
  // compute LOF_SCORE of each db object
  // Build result representation
    }
}

But i am getting exception which is saying something like this because forEach is providing only output of first future in knnq variable but not the combined output of both future's. Please help me how can i get rid of this exception with example thanks?

de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 505 ms
LOF #1/3: Materializing nearest-neighbor sets.
de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.k: 4
de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.k: 4
Materializing k nearest neighbors (k=4): 21751 [100%]  de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.precomputation-time: 21470 ms
Materializing k nearest neighbors (k=4): 21750 [100%] 
de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.precomputation-time: 22355 ms
LOF #2/3: Computing Local Reachability Densities (LRD).
Task failed
de.lmu.ifi.dbs.elki.database.datastore.ObjectNotFoundException: Object 
21751 was not found in the database.
at de.lmu.ifi.dbs.elki.database.datastore.memory.ArrayStore.get(ArrayStore.java:69)
at de.lmu.ifi.dbs.elki.index.preprocessed.knn.AbstractMaterializeKNNPreprocessor.get(AbstractMaterializeKNNPreprocessor.java:118)
at de.lmu.ifi.dbs.elki.database.query.knn.PreprocessorKNNQuery.getKNNForDBID(PreprocessorKNNQuery.java:84)
at de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF.computeLRD(LOF.java:292)
at de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF.computeLRDs(LOF.java:277)
at de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF.run(LOF.java:244)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:89)
at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(AlgorithmStep.java:100)
at de.lmu.ifi.dbs.elki.KDDTask.run(KDDTask.java:109)
at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.run(KDDCLIApplication.java:58)
at [...]

If you split your data set this way, one partitions neighbors will not see the other partitions neighbors.

You seem to want to parallelize LOF. Why don't you just use the existing parallel LOF?

https://elki-project.github.io/releases/current/doc/de/lmu/ifi/dbs/elki/algorithm/outlier/lof/parallel/ParallelLOF.html

You can study the source code to see how we parallelize this with a map-reduce like framework:

https://github.com/elki-project/elki/blob/9908f56f14ec76912745369edb68c07c4339eae0/elki-outlier/src/main/java/de/lmu/ifi/dbs/elki/algorithm/outlier/lof/parallel/ParallelLOF.java#L114L133

Alternatively - and closer to what you are doing right now - you can do two partitions, run LOF on both, then "join" the two LOF results by copying. There is no benefit in joining the kNN results before running LOF, as they will remain independent - one object from partition A will not see neighbors from partition B the way you set it up by partitioning the data.

Note that DatabaseUtil.precomputedKNNQuery is a convenience method for functionality used in a number of methods. But it's not "required" to be used. The ParallelLOF version does not use it, because it is not parallel.

In future ELKI 0.8 I hope we will have infrastructure in place that can automatically set up indexes (including such precomputed) as necessary; and maybe with a flag that allows this to be done in parallel (or not - for runtime comparisons, single-threaded algorithms usually yield more meaningful results).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM