獲取數組列表Java中前k項的最有效方法

Question

我試圖找到最快和最有效的方法來根據自定義可比較實現獲取對象數組列表中的前 K 個項目。

在我的研究過程中，有人建議我應該使用 Max/Min 堆，它在 Java 中被抽象為 PriorityQueue。 但是，問題是我不知道如何在對象的 arrayList 上實現它

這是我的對象實例

public class PropertyRecord {

    private long id;
    private String address, firstName, lastName, email, ownerAddress;
    private LocalDate dateSold;
    private BigDecimal price;


    public PropertyRecord(long id, String address, String firstName, String lastName, String email, String ownerAddress, LocalDate dateSold, BigDecimal price) {

        this.id = id;
        this.address = address;
        this.firstName = firstName;
        this.lastName = lastName;
        this.email = email;
        this.ownerAddress = ownerAddress;
        this.dateSold = dateSold;
        this.price = price;

    }
 //getters and setters...
}

我想根據價格獲得前 k 個項目。 我已經編寫了一個方法（見下文），它采用 arrayList 和 K（獲取前 K 個項目）並使用 StreamAPI，但我知道這不是最有效的方法，因為即使我只想要，這也會對整個列表進行排序前K項。 所以我想要 O(k log n) 而不是 O(n)。

//return the top n properties based on sale price.
    public List<PropertyRecord> getTopProperties(List<PropertyRecord> properties, int n){

       //using StreamAPI
       return properties.stream()
               .sorted((p1, p2) -> p2.getPrice().compareTo(p1.getPrice()))
               .limit(n)
               .collect(Collectors.toList());

    }

有什么幫助嗎？

Answer 1

番石榴包含一個可以做到這一點的TopKSelector類。

在最新的Guava版本中，此功能現在作為Comparators.greatest()公開。

但是，如果您沒有被鎖定使用ArrayList進行存儲，那么最好使用PriorityQueue更好，因為它將自然地使元素保持優先級順序。

Answer 2

在 Java 中計算前 K 有幾種可能的選擇，那么哪種方法最有效？

package com.example;

import com.google.common.collect.Ordering;

import java.util.*;
import java.util.stream.Collectors;

public class TopKBenchmark {
    public static void main(String[] args) {
        int inputListSize = 500000;
        int topK = 1000;
        int runCount = 100;
        List<Integer> inputList = new ArrayList<>(inputListSize);
        Random rand = new Random();
        rand.setSeed(System.currentTimeMillis());
        for (int i = 0; i < inputListSize; i++) {
            inputList.add(rand.nextInt(100000));
        }

        List<Integer> result1 = null, result2 = null, result3 = null, result4 = null;

        // method 1: stream and limit
        for (int i = 0; i < runCount; i++) {
            result1 = inputList.stream().sorted().limit(topK).collect(Collectors.toList());
        }

        // method 2: sort all
        for (int i = 0; i < runCount; i++) {
            Collections.sort(inputList);
            result2 = inputList.subList(0, topK);
        }

        // method3: guava: TopKSelector
        Ordering<Integer> ordering = Ordering.natural();
        for (int i = 0; i < runCount; i++) {
            result3 = ordering.leastOf(inputList, topK);
        }

        // method4: PQ
        for (int i = 0; i < runCount; i++) {
            PriorityQueue<Integer> priorityQueue = new PriorityQueue<>(Collections.reverseOrder());
            for (Integer val: inputList) {
                if (priorityQueue.size() < topK || val < priorityQueue.peek()) {
                    priorityQueue.offer(val);
                }
                if (priorityQueue.size() > topK) {
                    priorityQueue.poll();
                }
            }

            result4 = new ArrayList<Integer>(priorityQueue);
            Collections.sort(result4);
        }

        if (result1.size() != result2.size() ||
                result2.size() != result3.size() ||
                result3.size() != result4.size()) {
            throw new RuntimeException();
        }
        for (int i = 0; i < result1.size(); i++) {
            if (!result1.get(i).equals(result2.get(i)) ||
                    !result2.get(i).equals(result3.get(i)) ||
                    !result3.get(i).equals(result4.get(i))) {
                throw new RuntimeException();
            }
        }
    }
}

我嘗試了以下inputListSize和topK組合：

inputListSize=100000, topK=5000
1000, 1000
5000, 1000
50000, 1000
500000, 1000

這是基准測試結果（越小越好）：

使用Spot Profiler for Java and Kotlin 。

注意： 1.4s, 23ms, 83ms, 719ms, 8.8s表示給出第一個、第二個……組合。

正如彼得在評論中提到的，這不是一個嚴格的基准。 最好逐個運行基准測試。

獲取數組列表Java中前k項的最有效方法

問題描述

2 個解決方案

解決方案1
1 2018-04-09 23:49:10

解決方案2
0 2022-12-20 15:38:28

獲取數組列表Java中前k項的最有效方法

問題描述

2 個解決方案

解決方案1 1 2018-04-09 23:49:10

解決方案2 0 2022-12-20 15:38:28

解決方案1
1 2018-04-09 23:49:10

解決方案2
0 2022-12-20 15:38:28