简体   繁体   English

慢芹菜任务时间

[英]Slow Celery Task Times

I'm using Django , Celery and RabbitMQ . 我正在使用DjangoCeleryRabbitMQ I have a simple task that sends emails. 我有一个发送电子邮件的简单任务。 This task works, but its very slow. 此任务有效,但速度很慢。

For example, I send 5000 emails , all 5000 emails go straight to RabbitMQ as normal but once in the message broker it then proceeds to takes around 30 minutes to complete and clear all tasks. 例如,我发送了5000封电子邮件 ,所有5000封电子邮件都照常直接发送到RabbitMQ ,但是一旦进入消息代理,它将继续花费大约30分钟来完成并清除所有任务。

Without Celery these same tasks would take just a few minutes to process all 5000 tasks. 如果没有芹菜,这些相同的任务将仅花费几分钟来处理所有5000个任务。

Have I missed configured something? 我错过了配置的东西吗? It would be very helpful if someone could spot my speed issue. 如果有人发现我的速度问题,那将非常有帮助。

task.py task.py

class SendMessage(Task):
    name = "Sending SMS"
    max_retries = 10
    default_retry_delay = 3

    def run(self, message_id, gateway_id=None, **kwargs):
        logging.debug("About to send a message.")


        try:
            message = Message.objects.get(pk=message_id)
        except Exception as exc:
            raise SendMessage.retry(exc=exc)

        if not gateway_id:
            if hasattr(message.billee, 'sms_gateway'):
                gateway = message.billee.sms_gateway
            else:
                gateway = Gateway.objects.all()[0]
        else:
            gateway = Gateway.objects.get(pk=gateway_id)

        account = Account.objects.get(user=message.sender)
        if account._balance() >= message.length:
            response = gateway._send(message)

            if response.status == 'Sent':
                # Take credit from users account.
                transaction = Transaction(
                    account=account,
                    amount=- message.charge,

                )
                transaction.save()
                message.billed = True
                message.save()
        else:
            pass

settings.py settings.py

# Celery
BROKER_URL = 'amqp://admin:xxxxxx@xx.xxx.xxx.xxx:5672//'
CELERY_SEND_TASK_ERROR_EMAILS = True

Apache config Apache配置

<VirtualHost *:80>
ServerName www.domain.com

DocumentRoot /srv/project/domain


WSGIDaemonProcess domain.com processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup domain.com

WSGIScriptAlias / /srv/project/domain/apache/django.wsgi
ErrorLog /srv/project/logs/error.log
</VirtualHost>

conf CONF

# Name of nodes to start, here we have a single node
#CELERYD_NODES="w1"
# or we could have three nodes:
CELERYD_NODES="w1 w2 w3"

# Where to chdir at start.
CELERYD_CHDIR="/srv/project/domain"

# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"

# How to call "manage.py celeryctl"
CELERYCTL="$CELERYD_CHDIR/manage.py celeryctl"

# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=900 --concurrency=8"

# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/srv/project/logs/celery/%n.log"
CELERYD_PID_FILE="/srv/project/celery/%n.pid"

# Workers should run as an unprivileged user.
CELERYD_USER="root"
CELERYD_GROUP="root"

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="domain.settings"

# Celery Beat Settings.

# Where to chdir at start.
CELERYBEAT_CHDIR="/srv/project/domain"

# Path to celerybeat
CELERYBEAT="$CELERYBEAT_CHDIR/manage.py celerybeat"

You are processing ~2.78 tasks/second (5000 tasks in 30 mins) which I can agree isn't that high. 你正在处理~2.78个任务/秒(30分钟内5000个任务),我同意这个并不高。 You have 3 nodes each running with a concurrency of 8 so you should be able to process 24 tasks in parallel. 您有3个节点,每个节点以8的并发运行,因此您应该能够并行处理24个任务。

Things to check: 要检查的事项:

CELERYD_PREFETCH_MULTIPLIER - This is set to 4 by default but if you have lots of short tasks it can be worthwhile to increase it. CELERYD_PREFETCH_MULTIPLIER默认情况下设置为4,但是如果您有许多短期任务,则值得增加。 It will reduce the impact of the time to take the messages from the broker at the cost that tasks will not be as evenly distributed across workers. 它将减少从代理获取消息的时间影响,代价是任务不会在工作者之间平均分配。

DB connection/queries - I count 5+ DB queries being executed for the successful case. 数据库连接/查询 - 我为成功案例执行了5个以上的数据库查询。 If you are using the default result backend for django-celery there are additional queries for storing the task result in the DB. 如果您使用django-celery的默认结果后端,则会有其他查询将任务结果存储在DB中。 django-celery will also close and reopen the DB connection after each task which adds some overhead. django-celery也将在每个任务之后关闭并重新打开数据库连接,这会增加一些开销。 If you have 5 queries and each one takes 100ms then your task will take at least 500ms with or without celery. 如果您有5个查询,每个查询需要100毫秒,那么无论是否使用芹菜,您的任务至少需要500毫秒。 Running the queries by themselves is one thing but you also need to ensure that nothing in your task is locking the table/rows preventing other tasks from running efficiently in parallel. 单独运行查询是一回事,但您还需要确保任务中没有任何内容锁定表/行,从而阻止其他任务有效地并行运行。

Gateway response times - Your task appears to call a remote service which I'm assuming is an SMS gateway. 网关响应时间 - 您的任务似乎调用远程服务,我假设它是一个SMS网关。 If that server is slow to respond then your task will be slow. 如果该服务器响应缓慢,那么您的任务将很慢。 Again the response times might be different for a single call vs when you are doing this at peak load. 同样,单次调用的响应时间可能与在峰值负载下执行此操作时的响应时间不同。 In the US, long-code SMS can only be sent at a rate of 1 per second and depending on where the gateway is doing that rate-limiting then it might be slowing down your task. 在美国,长码SMS只能以每秒1的速率发送,并且取决于网关执行该速率限制的位置,这可能会减慢您的任务速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM