Given a List of numbers: { 4, 5, 7, 3, 5, 4, 2, 4 }
The desired output would be: { 7, 3, 2 }
The solution I am thinking of is create below HashMap from the given List:
Map<Integer, Integer> numbersCountMap = new HashMap();
where key is the value of from the list and value is the occurrences count.
Then loop through the HashMap entry set and where ever the number contains count greater than one remove that number from the list.
for (Map.Entry<Int, Int> numberCountEntry : numbersCountMap.entrySet()) {
if(numberCountEntry.getValue() > 1) {
testList.remove(numberCountEntry.getKey());
}
}
I am not sure whether this is an efficient solution to this problem, as the remove(Integer)
operation on a list can be expensive. Also I am creating additional Map data structure. And looping twice, once on the original list to create the Map and then on the map to remove duplicates.
Could you please suggest a better way. May be Java 8 has better way of implementing this. Also can we do it in few lines using Streams and other new structures in Java 8?
By streams you can use:
Map<Integer, Long> grouping = integers.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
grouping.values().removeIf(c -> c > 1);
Set<Integer> result = grouping.keySet();
Or as @Holger mentioned, all you want to know, is whether there is more than one integer in your list, so just do:
Map<Integer, Boolean> grouping = integers.stream()
.collect(Collectors.toMap(Function.identity(),
x -> false, (a, b) -> true,
HashMap::new));
grouping.values().removeIf(b -> b);
// or
grouping.values().removeAll(Collections.singleton(true));
Set<Integer> result = grouping.keySet();
While YCF_L's answer does the thing and yields the correct result, I don't think it's a good solution to go with, since it mixes functional and procedural approaches by mutating the intermediary collection.
A functional approach would assume either of the following solutions:
Map<Integer, Boolean> map =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false));
List<Integer> result = map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
Note that we don't even care about the mutability of the map
variable. Thus we can omit the 4th parameter of toMap
collector.
List<Integer> result =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false))
.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
This code is still safe, but less readable. Chaining two or more pipelines is discouraged.
List<Integer> result =
integers.stream()
.collect(collectingAndThen(
toMap(identity(), x -> true, (a, b) -> false),
map -> map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList())));
The intermediary state (the grouped map) does not get exposed to the outside world. So we may be sure nobody will modify it while we're processing the result.
You may count the frequency of each number into LinkedHashMap to keep insertion order if it's relevant, then filter out the single numbers from the entrySet()
and keep the keys.
List<Integer> data = Arrays.asList(4, 5, 7, 3, 5, 4, 2, 4);
List<Integer> singles = data.stream()
.collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
.entrySet().stream()
.filter(e -> e.getValue() == 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
System.out.println(singles);
Printed output:
[7, 3, 2]
It's overengineered for just this problem. Also, your code is faulty:
Integer
, not Int
(minor niggle)remove
call removes the first matching element, and to make matters considerably worse, remove
on lists is overloaded: There's remove(int)
which removes an element by index, and remove(Object)
which removes an element by looking it up. In a List<Integer>
, it is very difficult to know which one you're calling. You want the 'remove by lookup' one.On complexity:
On modern CPUs, it's not that simple. The CPU works on 'pages' of memory, and because fetching a new page takes on the order of 500 cycles or more, it makes more sense to simplify matters and consider any operation that does NOT require a new page of memory to be loaded, to be instantaneous.
That means that if we're talking about a list of, say, 10,000 numbers or fewer? None of it matters. It'll fly by. Any debate about 'efficiency' is meaningless until we get to larger counts.
Assuming that 'efficiency' is still relevant:
arrayList.remove(item)
is pricey.An efficient strategy is probably to just commit to copying. A copy operation is O(n). For the whole thing, instead of O(n) per item. Sorting is O(n log n). This gives us a trivial algorithm:
int[]
too; until java 16 is out and you can use primitives in collections, int[]
is an order of magnitude more efficient than a List<Integer>
. That's O(n log n) + O(n)
complexity, which is O(n log n)
- a lot better than your O(n^2)
take.
Use int[]
, and add another step that you first go through juuust to count how large the output would be (because arrays cannot grow/shrink), and now you have a time complexity of O(n log n) + 2*O(n)
which is still O(n log n)
, and the lowest possible memory complexity, as sort is in-place and doesn't cost any extra.
If you really want to tweak it, you can go with a space complexity of 0 (you can write the reduced list inside the input).
One problem with this strategy is that you mess up the ordering in the input. The algorithm would produce 2, 3, 7
. If you want to preserve order, you can combine the hashmap solution with the sort, and make a copy as you loop solution.
You can use 3-argument reduce
method and walk through the stream only once, maintaining two sets of selected and rejected values.
final var nums = Stream.of(4, 5, 7, 3, 5, 4, 2, 4);
final var init = new Tuple<Set<Integer>>(new LinkedHashSet<Integer>(), new LinkedHashSet<Integer>());
final var comb = (BinaryOperator<Tuple<Set<Integer>>>) (a, b) -> a;
final var accum = (BiFunction<Tuple<Set<Integer>>, Integer, Tuple<Set<Integer>>>) (t, elem) -> {
if (t.fst().contains(elem)) {
t.snd().add(elem);
t.fst().remove(elem);
} else if (!t.snd().contains(elem)) {
t.fst().add(elem);
}
return t;
};
Assertions.assertEquals(nums.reduce(init, accum, comb).fst(), Set.of(7, 3, 2));
In this example, Tuple
were defined as record Tuple<T> (T fst, T snd) { }
Decided against the sublist method due to poor performance on large data sets. The following alternative is faster, and holds its own against stream solutions. Probably because Set
access to an element is in constant time. The downside is that it requires extra data structures. Give an ArrayList
list of elements, this seems to work quite well.
Set<Integer> dups = new HashSet<>(list.size());
Set<Integer> result = new HashSet<>(list.size());
for (int i : list) {
if (dups.add(i)) {
result.add(i);
continue;
}
result.remove(i);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.