Running multiple tasks in Python

Question

I have multiple servers, which are used by 10,000 clients around the world. Each of the client can initiate a task which takes around 5 minutes to run on a server. And if the server is fully occupied, the task needs to be queued.

The question here is what is the right architecture of libraries that can support this problem？Specifically the objectives below:

Monitoring and running several tasks in parallel at the same time
Monitoring the resources, and only take tasks from the queue when there are enough resources

Answer 1

Well, this may be a case of use for celery and monitoring with flower .

Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system. It's a task queue with focus on real-time processing, while also supporting task scheduling.

Celery allows meets both requirements. Although it may be a bit more work but it's possible to scale and reduce both cpu,ram usage when system is idle or you need to consume less ram.

I will point out some links about this topic:

Celery - minimize memory consumption

https://www.vinta.com.br/blog/2018/dealing-resource-consuming-tasks-celery/

https://medium.com/@alaminopu.me/solving-rabbitmq-high-cpu-memory-usages-problem-with-celery-d4172ba1c6b3

Also if you are looking to integrate it with apache-kafka check this stackoverflow question

Answer 2

You have to setup a message broker like RabbitMQ or Redis. If the data of the queues should be persistent, i reccomend you to use RabbitMQ. To send and receive tasks you can use Celery which allow you to both send tasks to a queue and run these in Celery workers. For queue monitoring exists Flower. A very good practice in this times is to implement all your architecture with docker. I show you an example of docker-compose.yml that set up 3 containers: Rabbit, flower and celery. the only thing you have to do is run the command docker-compose up -d :

version: '3.3'
services:
  rabbit:
    image: rabbitmq:3-management-alpine
    restart: always
    environment:
      RABBITMQ_ERLANG_COOKIE: cookie
      RABBITMQ_DEFAULT_USER: Admin 
      RABBITMQ_DEFAULT_PASS: 1234
    ports:
      - 5672:5672
      - 15672:15672

  celery:
    build: 
      context: .
    volumes: 
      - ./src:/app
    command: celery -A appendapp worker --loglevel=debug
    environment: 
      RABBITMQ_HOST: rabbit

  flower:
    image: mher/flower
    restart: always
    ports:
      - 5555:5555
    command:
      - "--broker=amqp://Admin:1234@rabbit//"

The code of./src/appendapp.py read a list from a json file, then add an item and finally save the updated list in the file. the code look like this:

from celery import Celery
import os, json, time

# Where the downloaded files will be stored
BASEDIR="./hola.json"

RABBIT_HOST = os.getenv("RABBITMQ_HOST") or "localhost"

# Create the app and set the broker location (RabbitMQ)
app = Celery('appendapp',
             backend='rpc://',
             broker=f'pyamqp://Admin:1234@{RABBIT_HOST}//')

@app.task
def add_item(item):
    #time.sleep(2.5)
    file = open(BASEDIR)
    data = json.load(file)
    data.append(item)
    file.close()
    wfile = open(BASEDIR, "w")
    json.dump(data, wfile)
    wfile.close()
    return f"Se agrego {item}"

Running multiple tasks in Python

Question

2 answers

solution1
0 2020-07-12 02:52:20

solution2
0 2020-07-12 03:18:13

Running multiple tasks in Python

Question

2 answers

solution1 0 2020-07-12 02:52:20

solution2 0 2020-07-12 03:18:13

solution1
0 2020-07-12 02:52:20

solution2
0 2020-07-12 03:18:13