简体   繁体   中英

Remove all instances of an item in a list if it appears more than once

Given a List of numbers: { 4, 5, 7, 3, 5, 4, 2, 4 }

The desired output would be: { 7, 3, 2 }

The solution I am thinking of is create below HashMap from the given List:

Map<Integer, Integer> numbersCountMap = new HashMap();

where key is the value of from the list and value is the occurrences count.

Then loop through the HashMap entry set and where ever the number contains count greater than one remove that number from the list.

for (Map.Entry<Int, Int> numberCountEntry : numbersCountMap.entrySet()) {
     if(numberCountEntry.getValue() > 1) {  
        testList.remove(numberCountEntry.getKey());
     }
}

I am not sure whether this is an efficient solution to this problem, as the remove(Integer) operation on a list can be expensive. Also I am creating additional Map data structure. And looping twice, once on the original list to create the Map and then on the map to remove duplicates.

Could you please suggest a better way. May be Java 8 has better way of implementing this. Also can we do it in few lines using Streams and other new structures in Java 8?

By streams you can use:

Map<Integer, Long> grouping = integers.stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
grouping.values().removeIf(c -> c > 1);
Set<Integer> result = grouping.keySet();

Or as @Holger mentioned, all you want to know, is whether there is more than one integer in your list, so just do:

Map<Integer, Boolean> grouping = integers.stream()
        .collect(Collectors.toMap(Function.identity(),
                x -> false, (a, b) -> true,
                HashMap::new));
grouping.values().removeIf(b -> b);
// or
grouping.values().removeAll(Collections.singleton(true));
Set<Integer> result = grouping.keySet();

While YCF_L's answer does the thing and yields the correct result, I don't think it's a good solution to go with, since it mixes functional and procedural approaches by mutating the intermediary collection.

A functional approach would assume either of the following solutions:

  1. Using intermediary variable:
Map<Integer, Boolean> map =
    integers.stream()
            .collect(toMap(identity(), x -> true, (a, b) -> false));

List<Integer> result = map.entrySet()
                          .stream()
                          .filter(Entry::getValue)
                          .map(Entry::getKey)
                          .collect(toList()); 

Note that we don't even care about the mutability of the map variable. Thus we can omit the 4th parameter of toMap collector.

  1. Chaining two pipelines (similar to Alex Rudenko's answer ):
List<Integer> result =
    integers.stream()
            .collect(toMap(identity(), x -> true, (a, b) -> false))
            .entrySet()
            .stream()
            .filter(Entry::getValue)
            .map(Entry::getKey)
            .collect(toList());

This code is still safe, but less readable. Chaining two or more pipelines is discouraged.

  1. Pure functional approach:
List<Integer> result =
    integers.stream()
            .collect(collectingAndThen(
                        toMap(identity(), x -> true, (a, b) -> false),
                        map -> map.entrySet()
                                  .stream()
                                  .filter(Entry::getValue)
                                  .map(Entry::getKey)
                                  .collect(toList())));

The intermediary state (the grouped map) does not get exposed to the outside world. So we may be sure nobody will modify it while we're processing the result.

You may count the frequency of each number into LinkedHashMap to keep insertion order if it's relevant, then filter out the single numbers from the entrySet() and keep the keys.

List<Integer> data = Arrays.asList(4, 5, 7, 3, 5, 4, 2, 4);
List<Integer> singles = data.stream()
    .collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
    .entrySet().stream()
    .filter(e -> e.getValue() == 1)
    .map(Map.Entry::getKey)
    .collect(Collectors.toList());

System.out.println(singles);

Printed output:

[7, 3, 2]

It's overengineered for just this problem. Also, your code is faulty:

  1. It's Integer , not Int (minor niggle)
  2. More importantly, a remove call removes the first matching element, and to make matters considerably worse, remove on lists is overloaded: There's remove(int) which removes an element by index, and remove(Object) which removes an element by looking it up. In a List<Integer> , it is very difficult to know which one you're calling. You want the 'remove by lookup' one.

On complexity:

On modern CPUs, it's not that simple. The CPU works on 'pages' of memory, and because fetching a new page takes on the order of 500 cycles or more, it makes more sense to simplify matters and consider any operation that does NOT require a new page of memory to be loaded, to be instantaneous.

That means that if we're talking about a list of, say, 10,000 numbers or fewer? None of it matters. It'll fly by. Any debate about 'efficiency' is meaningless until we get to larger counts.

Assuming that 'efficiency' is still relevant:

  1. integers don't have hashcode collisions.
  2. hashmaps with few to no key hash collisions are effectively O(1) on all single element ops such as 'add' and 'get'.
  3. arraylist's.remove(Object) method is O(n). It takes longer the larger the list is. In fact, it is doubly O(n): it takes O(n) time to find the element you want to remove, and then O(n) time again to actually remove it. For fundamental informatics twice O(n) is still O(n) but pragmatically speaking, arrayList.remove(item) is pricey.
  4. You're calling.remove about O(n) times, making the total complexity O(n^2). Not great, and not the most efficient algorithm. Practically or fundamentally.

An efficient strategy is probably to just commit to copying. A copy operation is O(n). For the whole thing, instead of O(n) per item. Sorting is O(n log n). This gives us a trivial algorithm:

  1. Sort the input. Note that you can do this with an int[] too; until java 16 is out and you can use primitives in collections, int[] is an order of magnitude more efficient than a List<Integer> .
  2. loop through the sorted input. Don't immediately copy, but use an intermediate: For the 0th item in the list, remember only 'the last item was FOO' and 'how many times did I see foo?'. Then, for any item, check if it is the same as the previous. If yes, increment count. If not, check the count: if it was 1, add it to the output, otherwise don't. In any case, update the 'last seen value' to the new value and set the count to 1. At the end, make sure to add the last remembered value if the count is 1, and make sure your code works even for empty lists.

That's O(n log n) + O(n) complexity, which is O(n log n) - a lot better than your O(n^2) take.

Use int[] , and add another step that you first go through juuust to count how large the output would be (because arrays cannot grow/shrink), and now you have a time complexity of O(n log n) + 2*O(n) which is still O(n log n) , and the lowest possible memory complexity, as sort is in-place and doesn't cost any extra.

If you really want to tweak it, you can go with a space complexity of 0 (you can write the reduced list inside the input).

One problem with this strategy is that you mess up the ordering in the input. The algorithm would produce 2, 3, 7 . If you want to preserve order, you can combine the hashmap solution with the sort, and make a copy as you loop solution.

You can use 3-argument reduce method and walk through the stream only once, maintaining two sets of selected and rejected values.

final var nums = Stream.of(4, 5, 7, 3, 5, 4, 2, 4);
final var init = new Tuple<Set<Integer>>(new LinkedHashSet<Integer>(), new LinkedHashSet<Integer>());
final var comb = (BinaryOperator<Tuple<Set<Integer>>>) (a, b) -> a;
final var accum = (BiFunction<Tuple<Set<Integer>>, Integer, Tuple<Set<Integer>>>) (t, elem) -> {
   if (t.fst().contains(elem)) {
      t.snd().add(elem);
      t.fst().remove(elem);
   } else if (!t.snd().contains(elem)) {
      t.fst().add(elem);
   }
   return t;
};
Assertions.assertEquals(nums.reduce(init, accum, comb).fst(), Set.of(7, 3, 2));

In this example, Tuple were defined as record Tuple<T> (T fst, T snd) { }

Decided against the sublist method due to poor performance on large data sets. The following alternative is faster, and holds its own against stream solutions. Probably because Set access to an element is in constant time. The downside is that it requires extra data structures. Give an ArrayList list of elements, this seems to work quite well.

Set<Integer> dups = new HashSet<>(list.size());
Set<Integer> result = new HashSet<>(list.size());
            
for (int i : list) {
    if (dups.add(i)) {
        result.add(i);
        continue;
    }
    result.remove(i);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM