简体   繁体   English

外部 API RabbitMQ 和 Celery 速率限制

[英]External API RabbitMQ and Celery rate limit

I'm using an external REST API which limits my API request at 1 CPS.我正在使用外部REST API,它将我的 API 请求限制为 1 CPS。

This is the following architecture:这是以下架构:

在此处输入图片说明

Versions:版本:

  • Flask烧瓶
  • RabbitMQ 3.6.4 RabbitMQ 3.6.4
  • AMPQ 1.4.9 AMPQ 1.4.9
  • kombu 3.0.35海带 3.0.35
  • Celery 3.1.23芹菜 3.1.23
  • Python 2.7蟒蛇 2.7

API client send Web request to internal API, API process the request and control at which rate is sent to RabbitMQ. API 客户端向内部 API 发送 Web 请求,API 处理请求并控制发送到 RabbitMQ 的速率。 These tasks could take from 5 seconds to 120 seconds, and there are situations in which tasks may Queue up and they get sent to external API at a higher rate than the one defined, resulting in numerous failed requests.这些任务可能需要 5 秒到 120 秒,在某些情况下,任务可能会排队并以比定义的更高的速率发送到外部 API,从而导致大量失败的请求。 (Resulting in around 5% of failed requests) (导致大约 5% 的失败请求)

Possible solutions:可能的解决方案:

  • Increase External API limit增加外部 API 限制
  • Add more workers添加更多工人
  • Keep track of failed tasks and retry them later跟踪失败的任务并稍后重试

Although those solutions may work, is not solving exactly the implementation of my rate limiter and controlling the real rate in which my workers can process the API requests.尽管这些解决方案可能有效,但并不能完全解决我的速率限制器的实现并控制我的工作人员处理 API 请求的实际速率。 As later I really need to control the external rate.后来我真的需要控制外部速率。

I believe if I can control RabbitMQ rate limit in which messages can be sent to workers, this could be a better option.我相信如果我可以控制可以将消息发送给工作人员的 RabbitMQ 速率限制,这可能是一个更好的选择。 I found the rabbitmq prefetch option but not sure if anyone can recommend other options to control the rate in which messages are sent to consumers?我找到了 rabbitmq 预取选项,但不确定是否有人可以推荐其他选项来控制向消费者发送消息的速率?

在此处输入图片说明

You will need to create your own rate limiter as Celery's rate limit only works per-worker and "does not work as you expect it to".您将需要创建自己的速率限制器,因为 Celery 的速率限制仅适用于每个工人,并且“不会像您期望的那样工作”。

I have personally found it completely breaks when trying to add new tasks from another task.我个人发现尝试从另一个任务添加新任务时它完全中断。

I think the requirement spectrum for rate limiting is too wide and depends on an application itself, so Celery's implementation is intentionally too simple.我认为限速的需求范围太广,取决于应用程序本身,所以 Celery 的实现故意过于简单。

Here is an example I've created using Celery + Django + Redis .这是我使用Celery + Django + Redis创建的示例。 Basically it adds a convenience method to your App.Task class which will keep track of your task execution rate in Redis .基本上它为您的App.Task类添加了一个方便的方法,它将跟踪您在Redis中的任务执行率。 If it is too high, the task will Retry at a later time.如果它太高,任务将在稍后Retry

This example uses sending a SMTP message as an example, but can easily be replaced with API calls.本示例以发送 SMTP 消息为例,但可以轻松替换为 API 调用。

The algorithm is inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/该算法的灵感来自 Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/

https://gist.github.com/Vigrond/2bbea9be6413415e5479998e79a1b11a https://gist.github.com/Vigrond/2bbea9be6413415e5479998e79a1b11a

# Rate limiting with Celery + Django + Redis
# Multiple Fixed Windows Algorithm inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
#   and Celery's sometimes ambiguous, vague, and one-paragraph documentation
#
# Celery's Task is subclassed and the is_rate_okay function is added


# celery.py or however your App is implemented in Django
import os
import math
import time

from celery import Celery, Task
from django_redis import get_redis_connection
from django.conf import settings
from django.utils import timezone


app = Celery('your_app')

# Get Redis connection from our Django 'default' cache setting
redis_conn = get_redis_connection("default")

# We subclass the Celery Task
class YourAppTask(Task):
  def is_rate_okay(self, times=30, per=60):
    """
      Checks to see if this task is hitting our defined rate limit too much.
      This example sets a rate limit of 30/minute.

      times (int): The "30" in "30 times per 60 seconds".
      per (int):  The "60" in "30 times per 60 seconds".

      The Redis structure we create is a Hash of timestamp keys with counter values
      {
        '1560649027.515933': '2',  // unlikely to have more than 1
        '1560649352.462433': '1',
      }

      The Redis key is expired after the amount of 'per' has elapsed.
      The algorithm totals the counters and checks against 'limit'.

      This algorithm currently does not implement the "leniency" described 
      at the bottom of the figma article referenced at the top of this code.
      This is left up to you and depends on application.

      Returns True if under the limit, otherwise False.
    """

    # Get a timestamp accurate to the microsecond
    timestamp = timezone.now().timestamp()

    # Set our Redis key to our task name
    key = f"rate:{self.name}"

    # Create a pipeline to execute redis code atomically
    pipe = redis_conn.pipeline()

    # Increment our current task hit in the Redis hash
    pipe.hincrby(key, timestamp)

    # Grab the current expiration of our task key
    pipe.ttl(key)

    # Grab all of our task hits in our current frame (of 60 seconds)
    pipe.hvals(key)

    # This returns a list of our command results.  [current task hits, expiration, list of all task hits,]
    result = pipe.execute()

    # If our expiration is not set, set it.  This is not part of the atomicity of the pipeline above.
    if result[1] < 0:
        redis_conn.expire(key, per)

    # We must convert byte to int before adding up the counters and comparing to our limit
    if sum([int(count) for count in result[2]]) <= times:
        return True
    else:
        return False


app.Task = YourAppTask
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

...

# SMTP Example
import random
from YourApp.celery import app
from django.core.mail import EmailMessage

# We set infinite max_retries so backlogged email tasks do not disappear
@app.task(name='smtp.send-email', max_retries=None, bind=True)
def send_email(self, to_address):

    if not self.is_rate_okay():
        # We implement a random countdown between 30 and 60 seconds 
        #   so tasks don't come flooding back at the same time
        raise self.retry(countdown=random.randint(30, 60))

    message = EmailMessage(
        'Hello',
        'Body goes here',
        'from@yourdomain.com',
        [to_address],
    )
    message.send()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM