I have a celery task running every 20 seconds across 3 instances all connected to one database. The problem is handler is firing off twice sometimes the tasks overlap. Seems like the filtered items are not updating while the tasks overlap:
@periodic_task(run_every=timedelta(seconds=20))
def process_webhook_transactions():
"""Process webhook transactions"""
transactions = WebhookTransaction.objects.filter(status=WebhookTransaction.UNPROCESSED)
for transaction in transactions:
data = transaction.body
event = data.get('event_category')
if event is None:
transaction.status = WebhookTransaction.ERROR
transaction.save()
continue
handler = WEBHOOK_HANDLERS.get(event, default_handler)
success = handler(data)
if success:
transaction.status = WebhookTransaction.PROCESSED
else:
transaction.status = WebhookTransaction.ERROR
transaction.save()
What is the best way to avoid this?
You could use select_for_update
and skip_locked
to prevent the duplicated rows when 3 workers run that task at the same time. Like so:
transactions = WebhookTransaction.objects.filter(status=WebhookTransaction.UNPROCESSED)
transactions = transactions.select_for_update(skip_locked=True, of=("self",))
But this approach will make one worker instance work harder than others (first task selected all the transactions and others don't have much transactions left). You could create a new task which also run in 20 seconds, and this task will split all transactions into smaller chunks (10-20 maybe?) and then trigger process_webhook_transactions
with these chunks.
If handler = WEBHOOK_HANDLERS.get(event, default_handler)
is an asynchronous, I think split chunk approach is also good because you could run it concurrent to improve the speed up the task.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.