简体   繁体   English

给出两个列表,找到前K个产品的有效方法

[英]Efficient way to find top K products given two lists

Given two lists of equal length N , I want to find the K largest products that can be made by multiplying an element from each list. 给定两个等长N列表,我想找到可以通过将每个列表中的一个元素相乘得到的K最大乘积。 For example, if 例如,如果

> A = [7, 9, 4, 1, 6]
> B = [8, 1, 3, 10, 7]
> K = 3

the result is [90, 72, 70] or [9*10, 9*8, 7*10] , found by 结果是[90, 72, 70][9*10, 9*8, 7*10] ,由

> sorted([x*y for x in A for y in B], reverse=True)[:K]
[90, 72, 70]

Is there a more efficient algorithm that doesn't involve multiplying out all N^2 pairs? 是否有一种更高效的算法不涉及将所有N^2对相乘?

As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). 如前所述,第一步是对列表AB进行降序排序(或者只是两个列表中的K个最大)。 Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0] . 然后,所有最大K乘积将位于左上角的大致三角形区域,最大乘积为A[0]*B[0] In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0 ). 换句话说,如果A[i]*B[j]位于顶部K,则A[i-1]*B[j]A[i]*B[j-1] (假设i, j > 0 )。

Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. 因此,您可以从左上角开始,然后使用来扩展当前元素的“下”和“右”邻居,并将它们也放到堆上,直到您拥有所有K个元素为止需要。 Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction. 或从A的所有K个最大元素开始,与堆中已经存在的B中的最大元素配对,然后仅向一个方向扩展。

Example in Python, using the heapq module, but the same will work in almost any other language. Python中的示例,使用了heapq模块,但几乎可以在任何其他语言中使用。 Note that we are adding negative products to the heap as the heap will be sorted smallest-first. 请注意,我们将积添加到堆中,因为堆将按最小优先顺序排序。

def top_k_prod(A, B, k):
    A = heapq.nlargest(k, A)
    B = heapq.nlargest(k, B)
    result = []
    heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
    while heap and len(result) < k:
        p, a, b = heapq.heappop(heap)
        result.append(-p)
        if b < len(B)-1:
            heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
    return result

Example: 例:

import random
A = [random.randint(0, 100) for _ in range(100)]
B = [random.randint(0, 100) for _ in range(100)]
K = 20
result = top_k_prod(A, B, K)
test = sorted([x*y for x in A for y in B], reverse=True)[:K]
print(result)
# [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
print(result == test)
# True

The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. 用于对AB进行排序的复杂度应约为O(NlogN + KlogK) ,然后循环中应进行具有堆操作的K次迭代。 Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected. 三角形“目标”区域中的每个单元仅从其左邻居扩展一次,添加到堆中但未使用的单元也限制为K(每个“行”中的一个),最多提供2 * K个元素检查。

Practical solution: 实际解决方案:

Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). 查找最大K从列表A和元素K使用最大元素从名单B partial_sort (这是快速排序的一个众所周知的修改,我相信蟒蛇在其库中相同)。 Largest products formed by these new lists are also the largest products of the original lists. 这些新列表构成的最大产品也是原始列表的最大产品。 Then use max-heap (priority queue) to find K largest products from new lists. 然后使用max-heap(优先级队列)从新列表中找到K最大的产品。

If we would find out K max values from both the lists, we would have the max K products from both the lists. 如果我们从两个列表中都找到K个最大值,则两个列表中都将有K乘积的最大值。

I would suggest two approaches to find out K max values: 我建议两种方法来找出K max值:

  1. If K <<< N ( K in 10s and N in millions ) 如果K <<< NK以10s为单位, N以百万为单位)
    Here you have couple of options. 在这里,您有几个选择。
    • You can use selection algorithm K times for both the lists. 您可以对两个列表使用K次选择算法 That would take O(N*K) 那将需要O(N*K)
    • K iterations of either Selection Sort or Bubble Sort . 选择排序冒泡排序的 K次迭代。 You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. 根据实现的类型,您将在数组的开头或结尾处拥有K max值。 Even that would be O(N*K) 即使是O(N*K)

Note that because K <<< N you can say that O(N*K) is almost O(N) 请注意,由于K <<< N您可以说O(N*K)几乎是O(N)

  1. K can be as same as N K可以与N相同
    • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort . 在这种情况下,最好的选择是使用Merge SortQuick Sort对两个列表进行排序 That would be O(N*lgN) 那将是O(N*lgN)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM