简体   繁体   中英

Java 8 findFirst and encounter order

The JavaDocs for findFirst say that if the stream has an encounter order, then the first element will always be returned, but if the stream does not have an encounter order, any element may be returned.

I'm trying to demonstrate how this works on a stream without an encounter order, but I can't get it to return anything other than the actual first element.

I tried adding the elements to a Set , which does not have a defined encounter order:

    Set<String> words = new HashSet<>();
    words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings"));
    Optional<String> firstString = words.stream()
            .findFirst();
    System.out.println(firstString);

Every time I run, I get a as the first string. Then I tried doing a Collections.shuffle on the List before adding it to the Set , but that didn't change anything.

    List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings");
    words = new HashSet<>();
    words.addAll(wordList);
    firstString = words.stream()
            .findFirst();
    System.out.println(firstString);

I still get back the word a every time.

Then I tried using the unordered method from BaseStream , which claims to return a stream without encounter order, but no difference:

    firstString = Stream.of("this", "is", "a", "stream", "of", "strings")
            .unordered()
            .findFirst();
    System.out.println(firstString);

Now I get the word this every time. Am I missing something? Is there some way to demonstrate that findFirst on an unordered stream returns different values?

Well, “any” includes the possibility of “first”. Of course, the Stream implementation does not waste efforts in randomizing the data, so for a lot of cases, especially with sequential execution, it will still be the first element, if we can call it that way (as without an order, there is no distinguished first element).

Your best chances of exhibiting different results for findFirst are with parallel Streams. But even there, not every combination of operations is suitable of exhibiting the unorderedness.

One point is, that in the current implementation , the findFirst() operation does not change it's behavior when the Stream is unordered, ie it does not actively try to be like findAny() . It may still exhibit unpredictable behavior due to the source of the Stream, but if your source is Stream.of("this", "is", "a", "stream", "of", "strings") , ie an immutable sequence of a known size, it has already the best parallel performance possible, so there's simply no way of drawing a benefit of the chained unordered() , hence, the current implementation doesn't change its behavior.

It might surprise, but this even applies to a HashSet to some degree. While it has an unspecified order, there will be an actual order within its backing array at some point of time and as long as you don't modify the Set , there will be no reason to shuffle these entries around, so for a particular HashSet instance, you may repeatedly get the same “first” element, though it's not specified which one and even within a single runtime, another HashSet instance representing the same contents, but having a different history, may have a different order.


One example of an operation that is known to draw a benefit from the unordered characteristics is distinct . While it has to sort out duplicates, it has to keep the first encountered of equal elements, if it makes a notable difference. This can degrade the performance significantly, hence, the implementation will immediately try to get a benefit if the stream is unordered. Eg

List<String> equal=IntStream.range(0, 100)
    .mapToObj(i->new String("test")) // don't do this in normal code
    .collect(Collectors.toList());
Map<String, Integer> map = IntStream.range(0, equal.size())
    .collect(IdentityHashMap::new, (m,i)->m.put(equal.get(i),i), Map::putAll);

equal.parallelStream().distinct().map(map::get)
     .findFirst().ifPresent(System.out::println);

This creates a bunch of equal but distinguishable String instances (which you normally shouldn't do), registers them with their positional number in an IdentityHashMap , so we can find out, which instance distinct has retained. Since the code above uses an ordered stream created by a List , it consistently prints 0 , regardless of how often you execute it.

In contrast,

equal.parallelStream().unordered().distinct().map(map::get)
     .findFirst().ifPresent(System.out::println);

will print arbitrary numbers of the range, as we have released the ordered contract and allow to pick any of the equal strings.


As already noted before, this is all implementation specific . You should never make an assumption about whether an operation can actually draw a benefit and thus, will change its behavior for unordered streams. The explanation above was only meant to illustrate why sometimes the behavior of a particular implementation might not change for unordered stream. Though, it still might in the next version or a different JRE implementation.

Holger has already ably explained the situation. (+1) I'd like to provide a demonstration of HashSet instances that have the same contents but that have different iteration order. First we create a set as before:

    List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings");
    Set<String> words = new HashSet<>(wordList);

We create another set of words, add a bunch of stuff (doesn't matter exactly what it is), and then remove it:

    Set<String> words2 = new HashSet<>(wordList);
    IntStream.range(0, 50).forEachOrdered(i -> words2.add(String.valueOf(i)));
    words2.retainAll(wordList);

If we inspect the results as follows:

    System.out.println(words.equals(words2));
    System.out.println(words);
    System.out.println(words2);

we can see from the output that the sets are equal but iterate in a different order:

true
[a, strings, stream, of, this, is]
[this, is, strings, stream, of, a]

As noted elsewhere, if you get a stream from these and call findFirst() , the result is the first element in iteration order, which will clearly differ between these sets.

What happened is that by adding and removing a bunch of elements, we've caused the set to increase its internal table size, requiring the elements to be rehashed. The original elements end up in different relative positions in the new table, even after the new elements have been removed.

Although HashSets have no specified iteration order, the order is likely to be repeatable (and even predictable) if the set is initialized with the same contents in the same way every time. We thus say that the stream from a set has no defined encounter order, even though the order is often the same each time.

Note that in JDK 9, the new immutable sets (and maps) are actually randomized, so their iteration orders will change from run to run, even if they are initialized the same way every time.

By marking your stream as unordered, you are not actually making it as such (you have not made the order in your Set any different), but instead you are removing any restriction that otherwise an ordered stream might impose.

The way to prove that this will return different results is to use a parallel stream.

 Set<String> words = new HashSet<>();
    words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings"));
    Optional<String> firstString = words.stream().parallel()
            .findFirst();
    System.out.println(firstString);

Running this a few times, shows:

  Optional[strings] and then Optional[this]

Changing your Set to a List and running in parallel will preserver the order:

 List<String> words = new ArrayList<>();
    words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings"));
    Optional<String> firstString = words.stream().parallel()
            .findFirst();
    System.out.println(firstString); // always Optional[this]

The absolute must read here is Holger great answer

As @Eugene already mentioned, calling unordered does not necessarily change the actual physical ordering of the elements. Don't forget that unordered is an intermediate operation that does nothing until a terminal operation is invoked.

Therefore I tend to think of it this way:

  1. When creating a Set containing the elements "this", "is", "a", "stream", "of", "strings" , then it happens that the first element in the Set when iterating over it is "a" , so findFirst just returns that value.

  2. When you create a stream using Stream.of("this", "is", "stream", "of", "strings") , it returns a stream with an ordering restriction which will be respected by findFirst . Calling unordered removes that restriction, but the element "this" is still physically the first element because unordered did not necessarily change the ordering in the source array.

A slighly better example might be the following:

Set<String> words = new HashSet<>();
words.addAll(Arrays.asList("this", "is", "stream", "of", "strings"));

Optional<String> firstString1 = words.stream().findFirst();
// Optional[strings]
System.out.println(firstString1);

Optional<String> firstString2 = words.stream()
                                     .sorted().findFirst();
// Optional[is]
System.out.println(firstString2);

Optional<String> firstString3 = Stream.of("this", "is", "stream", "of", "strings")
                                      .findFirst();
// Optional[this]
System.out.println(firstString3);

Optional<String> firstString4 = Stream.of("this", "is", "stream", "of", "strings")
                                      .unordered().findFirst();
// Optional[this]
System.out.println(firstString4);

Note how the sorted() method changes the result because it enforces the ordering restriction, unlike the unordered method which had no effect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM