I'm writing some algorithem where I need to use a collection, and the main (and only) action with them is union.
I'm going to have about 1 million objects, and I need to know which collection has the more efficient union method - The list or the HashSet (ot maybe something else?).
Thanks in advance.
I'm guessing that when you say "I will be using distinct
with the List", you mean something like this:
List l = ...
Set result = Collectors.toSet(l.stream().distinct()).union(someOtherSet);
compared with this:
HashSet h = ...
Set result = h.union(someOtherSet);
Clearly the second version is more efficient. The first one has to produce an intermediate set from the list. Each time you run it.
The only thing that the first one saves is some memory (in the long term), since the intermediate set becomes unreachable after use.
And the first version can be written more simply and more efficiently as:
List l = ...
Set result = new HashSet(l).union(someOtherSet);
The List API has no distinct()
method and no union()
method.
If you actually use Collection.contains()
to perform the union, then a HashSet()
will be much faster than any standard List
implementation. As @JBNizet states:
HashSet.contains is O(1). List.contains is O(n).
For example:
Set result = new HashSet();
for (Integer element: set1) {
if (set2.contains(element)) {
result.add(element);
}
}
// result now contains the union of set1 and set2.
Almost identical code works for lists. But it is much slower.
You asked:
Ok, yeah. But how about union?
See above. This is about implementing union
using contains
calls.
Whats that? O(?)
See the following articles:
So the both of the unions are the same O(N) (n - size of the second collection)?
No.
N x O(1)
is O(N)
N x O(N)
is O(N^2)
Or to be more precise:
min(M, N) x O(1)
is O(min(M, N))
N x O(M)
is O(NM)
where N and M are the sizes of the two sets / lists. You can tweak the performance of the HashSet
case by iterating the smaller of the two sets. as reflected above.
Finally, if the element type is Integer
then Bitset
could be more efficient than either List
or HashSet
. And it could use a couple of orders of magnitude less memory! Depending on the range of the integers, and the density of the sets.
That's the Java analysis. I'm not familiar with Scala but the underlying computations and complexity will be the same.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.