简体   繁体   English

尽可能在收集器中使用Characteristics.UNORDERED很重要吗?

[英]Is it important to use Characteristics.UNORDERED in Collectors when possible?

Since I use streams a great deal, some of them dealing with a large amount of data, I thought it would be a good idea to pre-allocate my collection-based collectors with an approximate size to prevent expensive reallocation as the collection grows. 由于我使用了很多流,其中一些处理大量数据,我认为预先分配基于集合的收集器大小是一个好主意,以防止随着集合的增长而进行昂贵的重新分配。 So I came up with this, and similar ones for other collection types: 所以我想出了这个,以及类似的其他集合类型:

public static <T> Collector<T, ?, Set<T>> toSetSized(int initialCapacity) {
    return Collectors.toCollection(()-> new HashSet<>(initialCapacity));
}

Used like this 像这样使用

Set<Foo> fooSet = myFooStream.collect(toSetSized(100000));

My concern is that the implementation of Collectors.toSet() sets a Characteristics enum that Collectors.toCollection() does not: Characteristics.UNORDERED . 我担心的是Collectors.toSet()的实现设置了一个Collectors.toCollection()没有的Characteristics枚举: Characteristics.UNORDERED There is no convenient variation of Collectors.toCollection() to set the desired characteristics beyond the default, and I can't copy the implementation of Collectors.toSet() because of visibility issues. Collectors.toCollection()没有方便的变体来设置超出默认值的所需特性,并且由于可见性问题,我无法复制Collectors.toSet()的实现。 So, to set the UNORDERED characteristic I'm forced to do something like this: 因此,为了设置UNORDERED特性,我不得不这样做:

static<T> Collector<T,?,Set<T>> toSetSized(int initialCapacity){
    return Collector.of(
            () -> new HashSet<>(initialCapacity),
            Set::add,
            (c1, c2) -> {
                c1.addAll(c2);
                return c1;
            },
            new Collector.Characteristics[]{IDENTITY_FINISH, UNORDERED});
}

So here are my questions: 1. Is this my only option for creating an unordered collector for something as simple as a custom toSet() 2. If I want this to work ideally, is it necessary to apply the unordered characteristic? 所以这里是我的问题:1。这是我唯一的选择,为自定义toSet() 2之类的东西创建一个无序收集器。如果我希望它理想地工作,是否有必要应用无序特征? I've read a question on this forum where I learned that the unordered characteristic is no longer back-propagated into the Stream. 在这个论坛上读到了一个问题,在那里我了解到无序特征不再向后传播到Stream中。 Does it still serve a purpose? 它仍然有用吗?

First of all, the UNORDERED characteristic of a Collector is there to aid performance and nothing else. 首先, CollectorUNORDERED特征是帮助表演而不是其他任何东西。 There is nothing wrong with a Collector not having that characteristic but not depending on the encounter order. Collector没有这个特征但不依赖于遭遇顺序没有错。

Whether this characteristic has an impact depends on the stream operations itself and implementation details . 此特性是否具有影响取决于流操作本身和实现细节 While the current implementation may not drain much advantage from it, due to the difficulties with the back-propagation, it doesn't imply that future versions won't. 虽然目前的实现可能不会从中消耗很多优势,但由于反向传播的困难,它并不意味着未来的版本不会。 Of course, a stream which is already unordered, is not affected by the UNORDERED characteristic of the Collector . 当然,已经无序的流不受CollectorUNORDERED特性的影响。 And not all stream operations have potential to benefit from it. 并非所有流操作都有可能从中受益。

So the more important question is how important is it not to prevent such potential optimizations (perhaps in the future). 因此,更重要的问题是不要阻止这种潜在的优化(可能在将来)是多么重要。

Note that there are other unspecified implementation details, affecting the potential optimizations when it comes to your second variant. 请注意,还有其他未指定的实现细节,影响了第二个变体的潜在优化。 The toCollection(Supplier) collector has unspecified inner workings and only guarantees to provide a final result of the type produced by the Supplier . toCollection(Supplier)收集器具有未指定的内部工作方式,仅保证提供Supplier生产的类型的最终结果。 In contrast, Collector.of(() -> new HashSet<>(initialCapacity), Set::add, (c1, c2) -> { c1.addAll(c2); return c1; }, IDENTITY_FINISH, UNORDERED) defines precisely how the collector ought to work and may also hinder internal optimizations of collection producing collectors of future versions. 相比之下, Collector.of(() -> new HashSet<>(initialCapacity), Set::add, (c1, c2) -> { c1.addAll(c2); return c1; }, IDENTITY_FINISH, UNORDERED)精确定义收藏家应该如何工作,也可能妨碍收集产生未来版本收藏家的内部优化。

So a way to specify the characteristics without touching the other aspects of a Collector would be the best solution, but as far as I know, there is no simple way offered by the existing API. 因此,在不涉及Collector的其他方面的情况下指定特性的方法将是最佳解决方案,但据我所知,现有API没有提供简单的方法。 But it's easy to build such a facility yourself: 但是你自己建立这样的设施很容易:

public static <T,A,R> Collector<T,A,R> characteristics(
                      Collector<T,A,R> c, Collector.Characteristics... ch) {
    Set<Collector.Characteristics> o = c.characteristics();
    if(!o.isEmpty()) {
        o=EnumSet.copyOf(o);
        Collections.addAll(o, ch);
        ch=o.toArray(ch);
    }
    return Collector.of(c.supplier(), c.accumulator(), c.combiner(), c.finisher(), ch);
}

with that method, it's easy to say, eg 用这种方法,很容易说,例如

HashSet<String> set=stream
    .collect(characteristics(toCollection(()->new HashSet<>(capacity)), UNORDERED));

or provide your factory method 或提供您的工厂方法

public static <T> Collector<T, ?, Set<T>> toSetSized(int initialCapacity) {
    return characteristics(toCollection(()-> new HashSet<>(initialCapacity)), UNORDERED);
}

This limits the effort necessary to provide your characteristics (if it is a recurring problem), so it won't hurt to provide them, even if you don't know how much impact it will have. 这限制了提供你的特征所需的努力(如果它是一个反复出现的问题),所以即使你不知道它会产生多大的影响,提供它们也不会有什么坏处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM