简体   繁体   中英

How can I find the largest M numbers from N numbers in Java 8?

IntStream may be a the easiest way but I can only pick up smallest M numbers as below:

public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception {
        System.out.println(Arrays.asList(IntStream.of(arr).sorted().limit(5).boxed().toArray()));
    }
}

btw, considering algorithm complexity and assuming N >> M, a "sorted + limit" approach just have a complexity of O(N log(N)).

I think the best complexity may reach to O(N log(M)) but I do not know whether Java 8 has this kind of stream methods or collectors.

If you must use Streams:

IntStream.of(arr).sorted().skip(N-M)

Otherwise use a PriorityQueue and write yourself an inverting Comparator . Insertion will be O(N(log(N)) and removal of M elements will be O(M(log(N)) . Not what you asked for, but maybe close enough.

EJP has it right, I tested it - yields 8 and 9 when given an input of 2.

import java.util.stream.IntStream;
public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception { 
        int n = Integer.parseInt(args[0]);
        System.out.println("Finding "+n+" largest numbers in arr");
        IntStream.of(arr).sorted().skip(arr.length-n).boxed().forEach(big -> System.out.println(big));
    }
}

If you are already using google guava in your project, you can take advantage of MinMaxPriorityQueue :

Collection<..> min5 = stream.collect(
    toCollection(MinMaxPriorityQueue.maximumSize(5)::create)
);

It's possible to create a custom collector using the JDK PriorityQueue to solve your task:

public static <T> Collector<T, ?, List<T>> maxN(Comparator<? super T> comparator, 
                                                int limit) {
    BiConsumer<PriorityQueue<T>, T> accumulator = (queue, t) -> {
        queue.add(t);
        if (queue.size() > limit)
            queue.poll();
    };
    return Collector.of(() -> new PriorityQueue<>(limit + 1, comparator),
            accumulator, (q1, q2) -> {
                for (T t : q2) {
                    accumulator.accept(q1, t);
                }
                return q1;
            }, queue -> new ArrayList<>(queue));
}

Usage:

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.naturalOrder(), 2)));
// [8, 9]
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.reverseOrder(), 3)));
// [3, 1, 2]

It might be faster for big data sets and small limits as it does not sort. If you want a sorted result, you can add the sorting step to the finisher .

You can achieve your complexity goal by creating a histogram of the values:

public static IntStream maxValues(IntStream source, int limit) {
    TreeMap<Integer,Integer> m=new TreeMap<>();
    source.forEachOrdered(new IntConsumer() {
        int size, min=Integer.MIN_VALUE;
        public void accept(int value) {
            if(value<min) return;
            m.merge(value, 1, Integer::sum);
            if(size<limit) size++;
            else m.compute(min=m.firstKey(), (k,count)->count==1? null: count-1);
        }
    });
    if(m.size()==limit)// no duplicates
        return m.keySet().stream().mapToInt(Integer::valueOf);
    return m.entrySet().stream().flatMapToInt(e->{
        int value = e.getKey(), count = e.getValue();
        return count==1? IntStream.of(value): IntStream.range(0, count).map(i->value);
    });
}

It creates a map from int values to their corresponding number of occurrences but limits its contents to the desired number of values, hence, it's operation has a O(log(M)) complexity (worst case, if no duplicates) and, since the operation is performed once for each value, it's overall complexity is O(N×log(M)) as you wished.

You may test it with your original array as

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
maxValues(Arrays.stream(arr), 3).forEach(System.out::println);

but to test some corner cases, you may use an array containing duplicates like

int[] arr = {8, 5, 3, 4, 2, 2, 9, 1, 7, 9, 8, 6};
// note that the stream of three max elements contains one of the two eights

If you strive for maximum performance, replacing the boxing treemap with an adequate data structure using primitive data types may be feasible but that would be a minor performance optimization as this solution already solved the complexity problem.

By the way, this solution works for arbitrary streams, ie doesn't need to know the value of N .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM