简体   繁体   English

用Java形成与谓词集合匹配的对象集合的最快方法是什么?

[英]Whats the fastest way in Java to form a collection of objects that match a collection of predicates?

Consider a collection of objects and a collection of predicates, whats the fastest way to form a collection of predicates object pairs, where each pair is an object and a predicate which returns true. 考虑一个对象集合和谓词集合,这是形成谓词对象对集合的最快方法,其中每个对都是一个对象和一个返回true的谓词。

Also objects must be unique across pairs, but this does not apply to predicates. 此外,对象在成对中必须唯一,但这不适用于谓词。

Ie consider objects A, B, and C, and predicates P1, P2, P3 即考虑对象A,B和C,并谓词P1,P2,P3

(A,P1),(B,P1),(C,P2) is a valid set of pairs, however (A,P1),(A,P1),(C,P2) is a not valid as there are duplicate objects across pairs (A,P1),(B,P1),(C,P2)是有效的对对,但是(A,P1),(A,P1),(C,P2)是无效的,因为存在重复跨对象

So once a predicate is matched to an object it effectively owns it. 因此,一旦谓词与对象匹配,它便会有效地拥有它。

Ie whats the fastest way to implement the method below given the constraints above: 即鉴于上述限制,实现以下方法的最快方法是什么:

Collection<Pair<Object,Predicate<Object>> getAllMatches(Collection<Object> objects, Collection<Predicate<Object>>);

where Pair is: 配对在哪里:

class Pair<A,B> {
    A a;
    B b;
}

I know I'd need to use multi-threading but I'm not sure of the best strategy or the best collection implementations to use. 我知道我需要使用多线程,但是我不确定要使用的最佳策略还是最佳集合实现。 Also I imagine the uniqueness contraint will introduce contention due to the need for some sort of locking or ownership mechanism. 同样,我认为唯一性约束会由于需要某种锁定或所有权机制而引起竞争。

Heres my attempt, it seems to basic, surely there must be a faster way: 从根本上讲,这是我的尝试,当然必须有一种更快的方法:

  Collection<Pair<Object,Predicate> getAllMatches(BlockingQueue<Object> objects, Collection<Predicate> predicates){
List<Callable<Pair>> callables = new ArrayList<>();
for (Object o : objects){
    Callable<Pair> c = ()-> {
        Object polled = objects.take();
        for (Predicate p : predicates){
            if (p.test(polled)){
                return new Pair<Object,Predicate>(o,p);
            }
        }
        objects.put(o);
        return null;
    }
    callables.add(c);
}
List<Future<Pair>> futurePairs = exectors.invokeAll(callables);

 // return pairs
}

Your benchmark should absolutely be: 您的基准绝对应该是:

final Collection<Pair<Object, Predicate>> getMatches(
        final Collection<Object> objects, 
        final Collection<Predicate> predicates) {
    final Set<Pair<Object, Predicate>> matches = new HashSet<>();
    for (Object o : objects) {
        for (Predicate p : predicates) {
            if (p.test(o)) {
                matches.add(Pair.with(o, p));
                break;
            }
        }
    }
    return matches;
}

Sequential execution is often fastest. 顺序执行通常是最快的。 It may seem counter-intuitive - running tests on multiple cores should be fastest - but for many operations, your bottleneck is actually going to be memory-access. 似乎违反直觉-在多个内核上运行测试应该是最快的-但是对于许多操作,您的瓶颈实际上将是内存访问。 Each processor is going to stall, doing nothing, while it verifies that it's processor-level-cache is consistent with all of the other processors' caches. 每个处理器都将停止运行,什么也不做,同时它会验证其处理器级别的缓存与其他所有处理器的缓存是否一致。

I'd propose testing something like this, if you're confident that multithreading is going to save you some time: 如果您确信多线程可以为您节省一些时间,我建议您进行如下测试

final Collection<Pair<Object, Predicate>> getMatches(
        final Collection<Object> objects, 
        final Collection<Predicate> predicates) {
    final List<Future<Pair<Object,Predicate>>> futures = new ArrayList<>();
    for (final Object o : objects) {
        futures.add(executorService.invoke(() -> {
            for (Predicate p : predicates) {
                if (p.test(o)) {
                    return Pair.with(o, p);
                }
            }
            return null;
        });
    }
    final Collection<Pair<Object,Predicate>> matches = new ArrayList<>(futures.size());
    for (final Future<Pair<Object,Predicate>> future : futures) {
        final Pair<Object,Predicate> pair = future.get();
        if (pair != null) {
            matches.add(pair);
        }
    }
    return matches;
}

None of the threads write to shared memory, so there is no lock-contention to worry about. 所有线程均不写入共享内存,因此无需担心锁争用。

I think you might be overcomplicating this a bit. 我认为您可能会使这个问题变得过于复杂。 Stream over objects and for each object find a predicate that matches it: 流式传输对象,并为每个对象找到与其匹配的谓词:

objects.stream()    // or parallelStream() for multithreaded
    .distinct()     // can omit this if uniqueness of objects is enforced elsewhere
    .flatMap(obj -> predicates.stream()
        .filter(p -> p.test(obj))
        .map(p -> new Pair<>(obj, p))
        .limit(1)   // one predicate per object
    ).collect(toList());

A short solution with Java8 streams using declarative parallelism: 使用声明式并行性的Java8流的简短解决方案:

Collection<Pair<Object, Predicate<Object>>> getAllMatches(Set<Object> objects, Set<Predicate<Object>> predicates) {
    List<Pair<Object, Predicate<Object>>> pairs = predicates.parallelStream()
        .map(predicate -> new Pair<>(objects.stream().filter(predicate), predicate))
        .flatMap(pair -> pair.a.map(a -> new Pair<>(a, pair.b)))
        .collect(toList());
    return pairs;
}

Parallel execution is managed by the stream: 并行执行由流管理:

    List<Pair<Object, Predicate<Object>>> pairs = predicates.parallelStream()

The next line creates a Pair (the stream contains all matching objects): 下一行创建一个对(流包含所有匹配的对象):

        .map(predicate -> new Pair<>(objects.stream().filter(predicate), predicate))

The next line flattens to Pair: 下一行变平为Pair:

        .flatMap(pair -> pair.a.map(a -> new Pair<>(a, pair.b)))

The last line creates the final collection: 最后一行创建最终集合:

        .collect(toList());

If objects or predicates are not free of duplicates, put .distinct() behind stream() or parallelStream() . 如果objectspredicates没有重复项,请将.distinct()放在stream()parallelStream()后面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM