[英]How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?
如何将Django与AWS Elastic Beanstalk一起使用,它也只能在主节点上通过芹菜运行任务?
This is how I set up celery with django on elastic beanstalk with scalability working fine. 这就是我在弹性豆茎上用django设置芹菜的方法,可扩展性很好。
Please keep in mind that 'leader_only' option for container_commands works only on environment rebuild or deployment of the App. 请记住,对于container_commands“leader_only”选项仅适用于环境重建或App的部署 。 If service works long enough, leader node may be removed by Elastic Beanstalk.
如果服务工作时间足够长,则可以通过Elastic Beanstalk删除领导节点。 To deal with that, you may have to apply instance protection for your leader node.
要解决此问题,您可能必须为领导节点应用实例保护。 Check: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection-instance
检查: http : //docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection-instance
Add bash script for celery worker and beat configuration. 为芹菜工人添加bash脚本并节拍配置。
Add file root_folder/.ebextensions/files/celery_configuration.txt : 添加文件root_folder / .ebextensions / files / celery_configuration.txt :
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A django_app --loglevel=INFO
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A django_app --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
Take care about script execution during deployment, but only on main node (leader_only: true). 在部署期间注意脚本执行,但仅限于主节点(leader_only:true)。 Add file root_folder/.ebextensions/02-python.config :
添加文件root_folder / .ebextensions / 02-python.config :
container_commands:
04_celery_tasks:
command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
05_celery_tasks_run:
command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
File requirements.txt 文件requirements.txt
celery==4.0.0
django_celery_beat==1.0.1
django_celery_results==1.0.1
pycurl==7.43.0 --global-option="--with-nss"
Configure celery for Amazon SQS broker (Get your desired endpoint from list: http://docs.aws.amazon.com/general/latest/gr/rande.html ) root_folder/django_app/settings.py : 为Amazon SQS代理配置celery(从列表中获取所需的端点: http : //docs.aws.amazon.com/general/latest/gr/rande.html ) root_folder / django_app / settings.py :
...
CELERY_RESULT_BACKEND = 'django-db'
CELERY_BROKER_URL = 'sqs://%s:%s@' % (aws_access_key_id, aws_secret_access_key)
# Due to error on lib region N Virginia is used temporarily. please set it on Ireland "eu-west-1" after fix.
CELERY_BROKER_TRANSPORT_OPTIONS = {
"region": "eu-west-1",
'queue_name_prefix': 'django_app-%s-' % os.environ.get('APP_ENV', 'dev'),
'visibility_timeout': 360,
'polling_interval': 1
}
...
Celery configuration for django django_app app django django_app应用程序的芹菜配置
Add file root_folder/django_app/celery.py : 添加文件root_folder / django_app / celery.py :
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_app.settings')
app = Celery('django_app')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
Modify file root_folder/django_app/__init__.py : 修改文件root_folder / django_app / __ init__.py :
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from django_app.celery import app as celery_app
__all__ = ['celery_app']
Check also: 检查还:
This is how I extended the answer by @smentek to allow for multiple worker instances and a single beat instance - same thing applies where you have to protect your leader. 这就是我通过@smentek扩展答案以允许多个工作者实例和单个节拍实例的方式 - 同样的事情适用于您必须保护您的领导者的地方。 (I still don't have an automated solution for that yet).
(我还没有自动解决方案)。
Please note that envvar updates to EB via the EB cli or the web interface are not relflected by celery beat or workers until app server restart has taken place. 请注意,在应用程序服务器重新启动之前,芹菜节拍或工作人员不会通过EB cli或Web界面更新EB的envvar更新。 This caught me off guard once.
这让我措手不及。
A single celery_configuration.sh file outputs two scripts for supervisord, note that celery-beat has autostart=false
, otherwise you end up with many beats after an instance restart: 单个celery_configuration.sh文件为supervisord输出两个脚本,请注意celery-beat具有
autostart=false
,否则在实例重启后最终会出现许多节拍:
# get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
celeryenv=${celeryenv%?}
# create celery beat config script
celerybeatconf="[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A lexvoco --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=false
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 10
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# create celery worker config script
celeryworkerconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A lexvoco --loglevel=INFO
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=999
environment=$celeryenv"
# create files for the scripts
echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf
echo "$celeryworkerconf" | tee /opt/python/etc/celeryworker.conf
# add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celerybeat.conf celeryworker.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# reread the supervisord config
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
# update supervisord in cache without restarting all services
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update
Then in container_commands we only restart beat on leader: 然后在container_commands中我们只重新启动领导者:
container_commands:
# create the celery configuration file
01_create_celery_beat_configuration_file:
command: "cat .ebextensions/files/celery_configuration.sh > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && sed -i 's/\r$//' /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
# restart celery beat if leader
02_start_celery_beat:
command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat"
leader_only: true
# restart celery worker
03_start_celery_worker:
command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker"
If someone is following smentek's answer and getting the error: 如果有人关注smentek的回答并收到错误:
05_celery_tasks_run: /usr/bin/env bash does not exist.
know that, if you are using Windows, your problem might be that the "celery_configuration.txt" file has WINDOWS EOL when it should have UNIX EOL. 要知道,如果你使用的是Windows,你的问题可能是“celery_configuration.txt”文件在应该有UNIX EOL时有WINDOWS EOL。 If using Notepad++, open the file and click on "Edit > EOL Conversion > Unix (LF)".
如果使用Notepad ++,请打开文件并单击“编辑> EOL转换> Unix(LF)”。 Save, redeploy, and error is no longer there.
保存,重新部署和错误不再存在。
Also, a couple of warnings for really-amateur people like me: 还有像我这样的真正业余爱好者的几个警告:
Be sure to include "django_celery_beat" and "django_celery_results" in your "INSTALLED_APPS" in settings.py file. 请务必在settings.py文件中的“INSTALLED_APPS”中包含“django_celery_beat”和“django_celery_results”。
To check celery errors, connect to your instance with "eb ssh" and then "tail -n 40 /var/log/celery-worker.log" and "tail -n 40 /var/log/celery-beat.log" (where "40" refers to the number of lines you want to read from the file, starting from the end). 要检查芹菜错误,请使用“eb ssh”连接到您的实例,然后“tail -n 40 /var/log/celery-worker.log”和“tail -n 40 /var/log/celery-beat.log”(其中“40”指的是您想要从文件中读取的行数,从结尾开始)。
Hope this helps someone, it would've saved me some hours! 希望这对某人有所帮助,它会为我节省一些时间!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.