简体   繁体   English

具有自定义比较谓词的 heapq

[英]heapq with custom compare predicate

I am trying to build a heap with a custom sort predicate.我正在尝试使用自定义排序谓词构建堆。 Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.由于进入其中的值属于“用户定义”类型,因此我无法修改它们的内置比较谓词。

Is there a way to do something like:有没有办法做类似的事情:

h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)

Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.或者甚至更好,我可以将heapq函数包装在我自己的容器中,这样我就不需要继续传递谓词。

According to the heapq documentation , the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.根据heapq 文档,自定义堆顺序的方法是让堆上的每个元素都是一个元组,第一个元组元素是一个接受正常 Python 比较的元素。

The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. heapq 模块中的函数有点麻烦(因为它们不是面向对象的),并且总是需要我们的堆对象(一个堆化列表)作为第一个参数显式传递。 We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.通过创建一个非常简单的包装类,我们可以用一颗石头杀死两只鸟,它允许我们指定一个key函数,并将堆呈现为一个对象。

The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:下面的类保持一个内部列表,其中每个元素是一个元组,其中的第一个成员是一个键,在元素插入时使用key参数计算,在 Heap 实例化时传递:

# -*- coding: utf-8 -*-
import heapq

class MyHeap(object):
   def __init__(self, initial=None, key=lambda x:x):
       self.key = key
       self.index = 0
       if initial:
           self._data = [(key(item), i, item) for i, item in enumerate(initial)]
           self.index = len(self._data)
           heapq.heapify(self._data)
       else:
           self._data = []

   def push(self, item):
       heapq.heappush(self._data, (self.key(item), self.index, item))
       self.index += 1

   def pop(self):
       return heapq.heappop(self._data)[2]

(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError) (额外的self.index部分是为了避免在评估的键值是平局并且存储的值不能直接比较时发生冲突 - 否则 heapq 可能会因 TypeError 而失败)

Define a class, in which override the __lt__() function.定义一个类,在其中覆盖__lt__()函数。 See example below (works in Python 3.7):请参见下面的示例(适用于 Python 3.7):

import heapq

class Node(object):
    def __init__(self, val: int):
        self.val = val

    def __repr__(self):
        return f'Node value: {self.val}'

    def __lt__(self, other):
        return self.val < other.val

heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap)  # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]

heapq.heappop(heap)
print(heap)  # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]

The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order. heapq 文档建议堆元素可以是元组,其中第一个元素是优先级并定义排序顺序。

More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).然而,与您的问题更相关的是,该文档包括一个示例代码讨论,说明如何实现自己的 heapq 包装器函数来处理排序稳定性和具有同等优先级的元素(以及其他问题)的问题。

In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted.简而言之,他们的解决方案是让 heapq 中的每个元素都具有优先级、条目计数和要插入的元素的三元组。 The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.条目计数确保具有相同优先级的元素按照它们添加到 heapq 的顺序进行排序。

setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)

使用它来比较 heapq 中对象的值

The limitation with both answers is that they don't allow ties to be treated as ties.这两个答案的局限性在于它们不允许将关系视为关系。 In the first, ties are broken by comparing items, in the second by comparing input order.在第一个中,通过比较项目来打破联系,在第二个中通过比较输入顺序。 It is faster to just let ties be ties, and if there are a lot of them it could make a big difference.让关系成为关系会更快,如果有很多关系,它可能会产生很大的不同。 Based on the above and on the docs, it is not clear if this can be achieved in heapq.基于以上和文档,尚不清楚这是否可以在 heapq 中实现。 It does seem strange that heapq does not accept a key, while functions derived from it in the same module do. heapq 不接受密钥,而在同一模块中从它派生的函数接受,这似乎很奇怪。
PS: If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution. PS:如果您按照第一条评论中的链接(“可能重复...”),还有另一个定义 le 的建议,这似乎是一个解决方案。

In python3, you can use cmp_to_key from functools module.在 python3 中,您可以使用functools模块中的cmp_to_key cpython source code . cpython源代码

Suppose you need a priority queue of triplets and specify the priority use the last attribute.假设您需要一个三元组的优先级队列,并使用最后一个属性指定优先级。

from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
    key_l, key_r = triplet_left[2], triplet_right[2]
    if key_l > key_r:
        return -1  # larger first
    elif key_l == key_r:
        return 0  # equal
    else:
        return 1


WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj

Performance Test:性能测试:

Environment环境

python 3.10.2 python 3.10.2

Code代码

from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *

class WrapperCls1:
    __slots__ = 'obj'
    def __init__(self, obj):
        self.obj = obj
    def __lt__(self, other):
        kl, kr = self.obj[2], other.obj[2]
        return True if kl > kr else False

def cmp_class2(obj1, obj2):
    kl, kr = obj1[2], obj2[2]
    return -1 if kl > kr else 0 if kl == kr else 1

WrapperCls2 = cmp_to_key(cmp_class2)

triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]

def test_cls1():
    pq = []
    for triplet in triplets:
        heappush(pq, WrapperCls1(triplet))
        
def test_cls2():
    pq = []
    for triplet in triplets:
        heappush(pq, WrapperCls2(triplet))

def test_cls3():
    pq = []
    for triplet in triplets:
        heappush(pq, (-triplet[2], triplet))

start = time()
for _ in range(10):
    test_cls1()
    # test_cls2()
    # test_cls3()
print("total running time (seconds): ", -start+(start:=time()))

Results结果

use list instead of tuple , per function:根据 function 使用list而不是tuple

  • WrapperCls1: 16.2ms WrapperCls1:16.2ms
  • WrapperCls1 with __slots__ : 9.8ms WrapperCls1 与__slots__ :9.8ms
  • WrapperCls2: 8.6ms WrapperCls2:8.6ms
  • move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.将优先级属性移动到第一个 position (不支持自定义谓词):6.0ms。

Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.因此,此方法比使用带有重写的__lt__() function 和__slots__属性的自定义 class 稍快。

Simple and Recent简单和最近

A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.一个简单的解决方案是store entries as a list of tuples如果您需要为元组中的每个项目设置不同的顺序,则按所需顺序定义优先级,只需将其设为降序的负数即可。

See the official heapq python documentation in this topic Priority Queue Implementation Notes参见本主题优先级队列实现笔记中的官方heapq python文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM