Celery：對具有相同參數的任務進行速率限制

Question

我正在尋找一種方法來限制何時調用函數，但僅當輸入參數不同時，即：

@app.task(rate_limit="60/s")
def api_call(user):
   do_the_api_call()

for i in range(0,100):
  api_call("antoine")
  api_call("oscar")

所以我想api_call("antoine")被稱為每秒60次api_call("oscar")每秒60次為好。

關於我該怎么做的任何幫助？

--EDIT 27/04/2015 我曾嘗試在任務中使用 rate_limit 調用子任務，但它也不起作用： rate_limit 始終應用於所有實例化的子任務或任務（這是合乎邏輯的）。

@app.task(rate_limit="60/s")
def sub_api_call(user):
   do_the_api_call()

@app.task
def api_call(user):
  sub_api_call(user)

for i in range(0,100):
  api_call("antoine")
  api_call("oscar")

最好的事物！

Answer 1

更新

請參閱評論部分以獲得更好的方法的鏈接，該方法包含此處的大部分內容，但修復了此處版本具有的乒乓問題。 這里的版本天真地重試任務。 也就是說，它只是稍后再試一次，但會有一些抖動。 如果您有 1,000 個任務都在排隊，這會造成混亂，因為它們都在爭奪下一個可用位置。 他們都只是在 task worker 中進進出出，嘗試了數百次，然后終於有機會跑了。

我沒有采用這種幼稚的方法，接下來我嘗試的是指數退避，每次任務受到限制時，它的退避時間都會比之前的時間長一點。 這個概念可以工作，但它要求你存儲每個任務的重試次數，這很煩人，必須集中，而且它也不是最佳的，因為當你等待一個預定的任務時，你可能會有很長的沒有活動的延遲要運行的任務。（想象一下，一個任務第 50 次被限制並且必須等待一個小時，而一個限制計時器在它被重新調度之后的幾秒鍾后到期。在這種情況下，worker 將空閑一個小時，而它等待要運行的任務。）

嘗試此操作的更好方法是使用調度程序，而不是幼稚的重試或指數退避。 評論部分中鏈接的更新版本維護了一個基本調度程序，它知道何時重試任務。 它跟蹤任務被限制的順序，並知道任務運行的下一個窗口何時發生。 因此，想象一下 1 個任務分鍾的節流，具有以下時間線：

00:00:00 - Task 1 is attempted and begins running
00:00:01 - Task 2 is attempted. Oh no! It gets throttled. The current
           throttle expires at 00:01:00, so it is rescheduled then.
00:00:02 - Task 3 is attempted. Oh no! It gets throttled. The current
           throttle expires at 00:01:00, but something is already  
           scheduled then, so it is rescheduled for 00:02:00.
00:01:00 - Task 2 attempts to run again. All clear! It runs.
00:02:00 - Task 3 attempts to run again. All clear! It runs.

換句話說，根據 backlog 的長度，它會在當前限制到期后重新安排任務，並且所有其他重新安排、限制的任務都有機會運行。 （這需要幾周的時間才能弄清楚。）

原答案

我今天花了一些時間來解決這個問題，並提出了一個很好的解決方案。 對此的所有其他解決方案都有以下問題之一：

它們要求任務無限重試，從而使 celery 的重試機制無用。
他們不會根據參數進行節流
它因多個工作人員或隊列而失敗
它們笨重，等等。

基本上，您可以這樣包裝您的任務：

@app.task(bind=True, max_retries=10)
@throttle_task("2/s", key="domain", jitter=(2, 15))
def scrape_domain(self, domain):
    do_stuff()

結果是您將任務限制為每個域參數每秒運行 2 次，隨機重試抖動在 2-15 秒之間。 key參數是可選的，但對應於您的任務中的一個參數。 如果沒有給出關鍵參數，它只會將任務限制到給定的速率。 如果提供，則節流閥將應用於 (task, key) dyad。

另一種看待這個的方式是沒有裝飾器。 這提供了更多的靈活性，但需要您自己進行重試。 除了上述操作，您還可以執行以下操作：

@app.task(bind=True, max_retries=10)
def scrape_domain(self, domain):
    proceed = is_rate_okay(self, "2/s", key=domain)
    if proceed:
        do_stuff()
    else:
        self.request.retries = task.request.retries - 1  # Don't count this as against max_retries.
        return task.retry(countdown=random.uniform(2, 15))

我認為這與第一個示例相同。 更長一點，更多分支，但更清楚地展示了它的工作原理。 我希望自己總是使用裝飾器。

這一切都通過在 redis 中保持一個計數來工作。 實現非常簡單。 您在 redis 中為任務創建一個密鑰（以及密鑰參數，如果給定），並根據提供的時間表使 redis 密鑰過期。 如果用戶將速率設置為 10/m，則您創建一個 60 秒的 redis 鍵，並且每次嘗試使用正確名稱的任務時都會增加它。 如果您的增量器變得太高，請重試該任務。 否則，運行它。

def parse_rate(rate: str) -> Tuple[int, int]:
    """

    Given the request rate string, return a two tuple of:
    <allowed number of requests>, <period of time in seconds>

    (Stolen from Django Rest Framework.)
    """
    num, period = rate.split("/")
    num_requests = int(num)
    if len(period) > 1:
        # It takes the form of a 5d, or 10s, or whatever
        duration_multiplier = int(period[0:-1])
        duration_unit = period[-1]
    else:
        duration_multiplier = 1
        duration_unit = period[-1]
    duration_base = {"s": 1, "m": 60, "h": 3600, "d": 86400}[duration_unit]
    duration = duration_base * duration_multiplier
    return num_requests, duration


def throttle_task(
    rate: str,
    jitter: Tuple[float, float] = (1, 10),
    key: Any = None,
) -> Callable:
    """A decorator for throttling tasks to a given rate.

    :param rate: The maximum rate that you want your task to run. Takes the
    form of '1/m', or '10/2h' or similar.
    :param jitter: A tuple of the range of backoff times you want for throttled
    tasks. If the task is throttled, it will wait a random amount of time
    between these values before being tried again.
    :param key: An argument name whose value should be used as part of the
    throttle key in redis. This allows you to create per-argument throttles by
    simply passing the name of the argument you wish to key on.
    :return: The decorated function
    """

    def decorator_func(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            # Inspect the decorated function's parameters to get the task
            # itself and the value of the parameter referenced by key.
            sig = inspect.signature(func)
            bound_args = sig.bind(*args, **kwargs)
            task = bound_args.arguments["self"]
            key_value = None
            if key:
                try:
                    key_value = bound_args.arguments[key]
                except KeyError:
                    raise KeyError(
                        f"Unknown parameter '{key}' in throttle_task "
                        f"decorator of function {task.name}. "
                        f"`key` parameter must match a parameter "
                        f"name from function signature: '{sig}'"
                    )
            proceed = is_rate_okay(task, rate, key=key_value)
            if not proceed:
                logger.info(
                    "Throttling task %s (%s) via decorator.",
                    task.name,
                    task.request.id,
                )
                # Decrement the number of times the task has retried. If you
                # fail to do this, it gets auto-incremented, and you'll expend
                # retries during the backoff.
                task.request.retries = task.request.retries - 1
                return task.retry(countdown=random.uniform(*jitter))
            else:
                # All set. Run the task.
                return func(*args, **kwargs)

        return wrapper

    return decorator_func


def is_rate_okay(task: Task, rate: str = "1/s", key=None) -> bool:
    """Keep a global throttle for tasks

    Can be used via the `throttle_task` decorator above.

    This implements the timestamp-based algorithm detailed here:

        https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/

    Basically, you keep track of the number of requests and use the key
    expiration as a reset of the counter.

    So you have a rate of 5/m, and your first task comes in. You create a key:

        celery_throttle:task_name = 1
        celery_throttle:task_name.expires = 60

    Another task comes in a few seconds later:

        celery_throttle:task_name = 2
        Do not update the ttl, it now has 58s remaining

    And so forth, until:

        celery_throttle:task_name = 6
        (10s remaining)

    We're over the threshold. Re-queue the task for later. 10s later:

        Key expires b/c no more ttl.

    Another task comes in:

        celery_throttle:task_name = 1
        celery_throttle:task_name.expires = 60

    And so forth.

    :param task: The task that is being checked
    :param rate: How many times the task can be run during the time period.
    Something like, 1/s, 2/h or similar.
    :param key: If given, add this to the key placed in Redis for the item.
    Typically, this will correspond to the value of an argument passed to the
    throttled task.
    :return: Whether the task should be throttled or not.
    """
    key = f"celery_throttle:{task.name}{':' + str(key) if key else ''}"

    r = make_redis_interface("CACHE")

    num_tasks, duration = parse_rate(rate)

    # Check the count in redis
    count = r.get(key)
    if count is None:
        # No key. Set the value to 1 and set the ttl of the key.
        r.set(key, 1)
        r.expire(key, duration)
        return True
    else:
        # Key found. Check it.
        if int(count) <= num_tasks:
            # We're OK, run it.
            r.incr(key, 1)
            return True
        else:
            return False

Answer 2

我認為 Celery 的內置任務限制器無法實現這一點。

假設您為 API 使用某種緩存，最好的解決方案可能是創建任務名稱和參數的散列，並將該鍵用於基於緩存的節流器。

如果您使用 Redis，您可以設置一個 60 秒超時的鎖，或者使用增量計數器來計算每分鍾的調用次數。

這篇文章可能會給你一些關於使用 Redis 對 Celery 任務進行分布式節流的指導：

https://callhub.io/distributed-rate-limiting-with-redis-and-celery/

Celery：對具有相同參數的任務進行速率限制

問題描述

2 個解決方案

解決方案1
2 2021-02-11 19:47:25

更新

原答案

解決方案2
0 2015-04-26 08:34:04

Celery：對具有相同參數的任務進行速率限制

問題描述

2 個解決方案

解決方案1 2 2021-02-11 19:47:25

更新

原答案

解決方案2 0 2015-04-26 08:34:04

解決方案1
2 2021-02-11 19:47:25

解決方案2
0 2015-04-26 08:34:04