简体   繁体   English

扩展python Queue.PriorityQueue(工作人员优先级,工作包类型)

[英]Extending python Queue.PriorityQueue (worker priority, work package types)

I would like to extend the Queue.PriorityQueue described here: http://docs.python.org/library/queue.html#Queue.PriorityQueue 我想扩展此处描述的Queue.PriorityQueue: http ://docs.python.org/library/queue.html#Queue.PriorityQueue

The queue will hold work packages with a priority. 队列将具有优先级的工作包。 Workers will get work packages and process them. 工人将获得工作包并进行处理。 I want to make the following additions: 我要添加以下内容:

  1. Workers have a priority too. 工人也有优先权。 When multiple workers are idle the one with the highest priority should process an incoming work package. 当多个工人闲置时,优先级最高的工人应处理传入的工作包。

  2. Not every worker can process every work package, so a mechanism is needed that checks if work package type and worker capabilities have a match. 并非每个工作人员都能处理每个工作包,因此需要一种机制来检查工作包类型和工作人员能力是否匹配。

I am looking for hints, how this is best implemented (starting from scratch, extending PrioriyQueue or Queue, ...). 我正在寻找有关最佳实现方式的提示(从头开始,扩展PrioriyQueue或Queue,...)。

edit 编辑

Here is my first (untested) try. 这是我的第一次(尝试)。 The basic idea is that all waiting threads will be notified. 基本思想是将通知所有等待的线程。 Then they all try to get a work item through _choose_worker(self, worker) . 然后他们都尝试通过_choose_worker(self, worker)获得工作项。 (Made it community wiki) (社区社区Wiki)

edit 编辑

Works for some simple tests now... 现在可以进行一些简单的测试...

edit Added a custom BaseManager and a local copy of the worker list in the _choose_worker function. 编辑_choose_worker函数中添加了自定义BaseManager和工作列表的本地副本。

edit bug fix 编辑错误修复

import Queue
from Queue import Empty, Full
from time import time as _time
import heapq

class AdvancedQueue(Queue.PriorityQueue):

    # Initialize the queue representation
    def _init(self, _maxsize):
        self.queue = []
        self.worker = []

    def put(self, item, block=True, timeout=None):
        '''
        Put an item into the queue.

        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until a free slot is available. If 'timeout' is
        a positive number, it blocks at most 'timeout' seconds and raises
        the Full exception if no free slot was available within that time.
        Otherwise ('block' is false), put an item on the queue if a free slot
        is immediately available, else raise the Full exception ('timeout'
        is ignored in that case).
        '''
        self.not_full.acquire()
        try:
            if self.maxsize > 0:
                if not block:
                    if self._qsize() == self.maxsize:
                        raise Full
                elif timeout is None:
                    while self._qsize() == self.maxsize:
                        self.not_full.wait()
                elif timeout < 0:
                    raise ValueError("'timeout' must be a positive number")
                else:
                    endtime = _time() + timeout
                    while self._qsize() == self.maxsize:
                        remaining = endtime - _time()
                        if remaining <= 0.0:
                            raise Full
                        self.not_full.wait(remaining)
            self._put(item)
            self.unfinished_tasks += 1
            self.not_empty.notifyAll()  # only change
        finally:
            self.not_full.release()

    def get(self, worker, block=True, timeout=None):
        self.not_empty.acquire()
        try:
            self._put_worker(worker)

            if not block:
                if not self._qsize():
                    raise Empty
                else:
                    return self._choose_worker(worker)
            elif timeout is None:
                while True:
                    while not self._qsize():
                        self.not_empty.wait()
                    try:
                        return self._choose_worker(worker)
                    except Empty:
                        self.not_empty.wait()

            elif timeout < 0:
                raise ValueError("'timeout' must be a positive number")
            else:
                endtime = _time() + timeout
                def wait(endtime):
                    remaining = endtime - _time()
                    if remaining <= 0.0:
                        raise Empty
                    self.not_empty.wait(remaining)

                while True:
                    while not self._qsize():
                        wait(endtime)

                    try:
                        return self._choose_worker(worker)
                    except Empty:
                        wait(endtime)
        finally:
            self._remove_worker(worker)
            self.not_empty.release()

    # Put a new worker in the worker queue
    def _put_worker(self, worker, heappush=heapq.heappush):
        heappush(self.worker, worker)

    # Remove a worker from the worker queue
    def _remove_worker(self, worker):
        self.worker.remove(worker)

    # Choose a matching worker with highest priority
    def _choose_worker(self, worker):
        worker_copy = self.worker[:]    # we need a copy so we can remove assigned worker
        for item in self.queue:
            for enqueued_worker in worker_copy:
                if item[1].type in enqueued_worker[1].capabilities:
                    if enqueued_worker == worker:
                        self.queue.remove(item)
                        self.not_full.notify()
                        return item
                    else:
                        worker_copy.remove(enqueued_worker)
                        # item will be taken by enqueued_worker (which has higher priority),
                        # so enqueued_worker is busy and can be removed
                        continue
        raise Empty

I think you are describing a situation where you have two "priority queues" - one for the jobs and one for the workers. 我认为您正在描述一种情况,其中有两个“优先队列”-一个用于工作,一个用于工人。 The naive approach is to take the top priority job and the top priority worker and try to pair them. 天真的方法是采用最优先的工作和最优先的工作程序,并尝试将它们配对。 But of course this fails when the worker is unable to execute the job. 但是,当工人无法执行工作时,这当然会失败。

To fix this I'd suggest first taking the top priority job and then iterating over all the workers in order of descending priority until you find one that can process that job. 为了解决这个问题,我建议首先采取最高优先级的工作,然后按照优先级递减的顺序遍历所有工人,直到找到可以处理该工作的工人为止。 If none of the workers can process the job then take the second highest priority job, and so on. 如果没有一个工人可以处理该工作,则采取第二优先的工作,依此类推。 So effectively you have nested loops, something like this: 因此,您有效地拥有了嵌套循环,如下所示:

def getNextWorkerAndJobPair():
    for job in sorted(jobs, key=priority, reverse=True):
        for worker in sorted(workers, key=priority, reverse=True):
             if worker.can_process(job):
                 return (worker, job)

The above example sorts the data unnecessarily many times though. 上面的示例对数据进行了不必要的多次排序。 To avoid this it would be best to store the data already in sorted order. 为避免这种情况,最好按已排序的顺序存储数据。 As for what data structures to use, I'm not really sure what the best is. 至于要使用什么数据结构,我不确定是最好的。 Ideally you would want O(log n) inserts and removals and to be able to iterate over the collection in sorted order in O(n) time. 理想情况下,您希望O(log n)插入和删除,并能够在O(n)时间内按排序顺序遍历集合。 I think PriorityQueue meets the first of those requirements but not the second. 我认为PriorityQueue满足这些要求中的第一个要求,但不满足第二个要求。 I imagine that sortedlist from the blist package would work, but I haven't tried it myself and the webpage isn't specific about the performance guarantees that this class offers. 我认为blist包中的sortedlist可以工作,但是我自己还没有尝试过,而且该网页也不是关于此类提供的性能保证的。

The way I have suggested to iterate over the jobs first and then over the workers in the inner loop is not the only approach you could take. 我建议首先迭代工作然后再迭代内部循环中的工作人员的方法并不是您可以采用的唯一方法。 You could also reverse the order of the loops so that you choose the highest priority worker first and then try to find a job for it. 您还可以颠倒循环的顺序,以便先选择优先级最高的工作程序,然后再尝试为其找到工作。 Or you could find the valid (job, worker) pair that has the maximum value of f(priority_job, priority_worker) for some function f (for example just add the priorities). 或者,您可以找到一些函数f的有效对(作业对,工人对)具有最大值f(priority_job,priority_worker)(例如,仅添加优先级)。

The only answer was useful but not detailed enough, so I will accept my own answer for now. 唯一的答案很有用,但不够详尽,因此我现在将接受我自己的答案。 See the code in the question. 请参阅问题中的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM