I am trying to find the fastest and most efficient way to get the first top K items in arrayList of objects based on a custom compareable implementation.
during my research some suggested that i should use Max/Min heap which is abstracted in java as PriorityQueue. However, the problem is I dont know how to implement that on an arrayList of objects
here is my Object instance
public class PropertyRecord {
private long id;
private String address, firstName, lastName, email, ownerAddress;
private LocalDate dateSold;
private BigDecimal price;
public PropertyRecord(long id, String address, String firstName, String lastName, String email, String ownerAddress, LocalDate dateSold, BigDecimal price) {
this.id = id;
this.address = address;
this.firstName = firstName;
this.lastName = lastName;
this.email = email;
this.ownerAddress = ownerAddress;
this.dateSold = dateSold;
this.price = price;
}
//getters and setters...
}
i want to get the first top k items based on the price. I have written a method (see below) which takes the arrayList and K (get first K items) and used StreamAPI but i know it is not the most efficient way to do it because this will sort the whole list even though I want only the first K items. so instead of having an O(n) i want have O(k log n).
//return the top n properties based on sale price.
public List<PropertyRecord> getTopProperties(List<PropertyRecord> properties, int n){
//using StreamAPI
return properties.stream()
.sorted((p1, p2) -> p2.getPrice().compareTo(p1.getPrice()))
.limit(n)
.collect(Collectors.toList());
}
Any Help Please?
Guava contains a TopKSelector class that can do exactly this.
In the latest Guava version, this functionality is now exposed as Comparators.greatest()
.
However, if you're not locked into using an ArrayList
for storage, you're probably better off using a PriorityQueue
which will naturally keep the elements in priority order.
There are a few possible options to calculate top K in java, so which is the most efficient way?
package com.example;
import com.google.common.collect.Ordering;
import java.util.*;
import java.util.stream.Collectors;
public class TopKBenchmark {
public static void main(String[] args) {
int inputListSize = 500000;
int topK = 1000;
int runCount = 100;
List<Integer> inputList = new ArrayList<>(inputListSize);
Random rand = new Random();
rand.setSeed(System.currentTimeMillis());
for (int i = 0; i < inputListSize; i++) {
inputList.add(rand.nextInt(100000));
}
List<Integer> result1 = null, result2 = null, result3 = null, result4 = null;
// method 1: stream and limit
for (int i = 0; i < runCount; i++) {
result1 = inputList.stream().sorted().limit(topK).collect(Collectors.toList());
}
// method 2: sort all
for (int i = 0; i < runCount; i++) {
Collections.sort(inputList);
result2 = inputList.subList(0, topK);
}
// method3: guava: TopKSelector
Ordering<Integer> ordering = Ordering.natural();
for (int i = 0; i < runCount; i++) {
result3 = ordering.leastOf(inputList, topK);
}
// method4: PQ
for (int i = 0; i < runCount; i++) {
PriorityQueue<Integer> priorityQueue = new PriorityQueue<>(Collections.reverseOrder());
for (Integer val: inputList) {
if (priorityQueue.size() < topK || val < priorityQueue.peek()) {
priorityQueue.offer(val);
}
if (priorityQueue.size() > topK) {
priorityQueue.poll();
}
}
result4 = new ArrayList<Integer>(priorityQueue);
Collections.sort(result4);
}
if (result1.size() != result2.size() ||
result2.size() != result3.size() ||
result3.size() != result4.size()) {
throw new RuntimeException();
}
for (int i = 0; i < result1.size(); i++) {
if (!result1.get(i).equals(result2.get(i)) ||
!result2.get(i).equals(result3.get(i)) ||
!result3.get(i).equals(result4.get(i))) {
throw new RuntimeException();
}
}
}
}
I tried the following inputListSize
and topK
combinations:
Here is the benchmark result (the smaller the better):
using Spot Profiler for Java and Kotlin .
NOTE: 1.4s, 23ms, 83ms, 719ms, 8.8s
means when given the first, second, ... combination.
As Peter mentioned in the comments, this is not a strict benchmark. It would be best to run the benchmark case by case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.