簡體   English   中英

芹菜-批量隊列任務

[英]Celery - bulk queue tasks

我有一些代碼將大量(1000s)的芹菜任務排隊,例如,讓我們說一下是這樣的:

for x in xrange(2000):
    example_task.delay(x)

有沒有一種更好/更有效的方式來同時處理大量任務的方法? 他們都有不同的論點。

當我們想使用Celery處理數百萬個PDF時,我們也遇到了這個問題。 我們的解決方案是寫一些我們稱為CeleryThrottle東西。 基本上,您為節流閥配置了所需的Celery隊列和所需的任務數,然后在循環中創建任務。 在創建任務時,節流閥將監視實際隊列的長度。 如果消耗的速度過快,則會加快循環速度,因此會將更多任務添加到隊列中。 如果隊列過大,則會減慢循環速度並讓某些任務完成。

這是代碼:

class CeleryThrottle(object):
    """A class for throttling celery."""

    def __init__(self, min_items=100, queue_name='celery'):
        """Create a throttle to prevent celery run aways.

        :param min_items: The minimum number of items that should be enqueued. 
        A maximum of 2× this number may be created. This minimum value is not 
        guaranteed and so a number slightly higher than your max concurrency 
        should be used. Note that this number includes all tasks unless you use
        a specific queue for your processing.
        """
        self.min = min_items
        self.max = self.min * 2

        # Variables used to track the queue and wait-rate
        self.last_processed_count = 0
        self.count_to_do = self.max
        self.last_measurement = None
        self.first_run = True

        # Use a fixed-length queue to hold last N rates
        self.rates = deque(maxlen=15)
        self.avg_rate = self._calculate_avg()

        # For inspections
        self.queue_name = queue_name

    def _calculate_avg(self):
        return float(sum(self.rates)) / (len(self.rates) or 1)

    def _add_latest_rate(self):
        """Calculate the rate that the queue is processing items."""
        right_now = now()
        elapsed_seconds = (right_now - self.last_measurement).total_seconds()
        self.rates.append(self.last_processed_count / elapsed_seconds)
        self.last_measurement = right_now
        self.last_processed_count = 0
        self.avg_rate = self._calculate_avg()

    def maybe_wait(self):
        """Stall the calling function or let it proceed, depending on the queue.

        The idea here is to check the length of the queue as infrequently as 
        possible while keeping the number of items in the queue as closely 
        between self.min and self.max as possible.

        We do this by immediately enqueueing self.max items. After that, we 
        monitor the queue to determine how quickly it is processing items. Using 
        that rate we wait an appropriate amount of time or immediately press on.
        """
        self.last_processed_count += 1
        if self.count_to_do > 0:
            # Do not wait. Allow process to continue.
            if self.first_run:
                self.first_run = False
                self.last_measurement = now()
            self.count_to_do -= 1
            return

        self._add_latest_rate()
        task_count = get_queue_length(self.queue_name)
        if task_count > self.min:
            # Estimate how long the surplus will take to complete and wait that
            # long + 5% to ensure we're below self.min on next iteration.
            surplus_task_count = task_count - self.min
            wait_time = (surplus_task_count / self.avg_rate) * 1.05
            time.sleep(wait_time)

            # Assume we're below self.min due to waiting; max out the queue.
            if task_count < self.max:
                self.count_to_do = self.max - self.min
            return

        elif task_count <= self.min:
            # Add more items.
            self.count_to_do = self.max - task_count
            return

我們像這樣使用它:

throttle = CeleryThrottle(min_items=30, queue_name=queue)
for item in items:
    throttle.maybe_wait()
    do_something.delay()

因此,使用起來非常簡單,並且可以很好地將隊列保持在一個快樂的地方-不會太長,也不會太短。 它可以保持隊列耗盡的速率的滾動平均值,並且可以相應地調整自己的計時器。

調用大量任務可能對您的芹菜工作者不利。 同樣,如果您正在考慮收集調用任務的結果,那么您的代碼將不是最佳的。

您可以分批批量處理任務。 考慮以下鏈接中提到的示例。

http://docs.celeryproject.org/en/latest/userguide/canvas.html#chunks

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM