Using Flask with CPU-bound requests that need to be parallelized onto multiple cores

Question

A computational scientist where I work wrote a program that scores inputs using a machine learning model built with scikit-learn. My task is to make this ML scorer available as a microservice.

So I wrote a few lines of code using Flask to accomplish this. Mission achieved!

Well, not quite. Because this service is going to be beaten on pretty heavily at times, it needs to be able to crunch on several requests in parallel. (Ie, on multiple cores. We have about 20 on our server.) A solution that I can achieve with about ten minutes of effort is to just spin up ten or twenty of these little REST servers on different ports and round-robin to them using nginx as a reverse proxy.

Although this will work fine, I am sure, I think it would be more elegant to have a single Python server handling all the requests, rather than having twenty Python servers. So I started reading up on WSGI and uWSGI and a bunch of other things. But all that I have accomplished with all this reading and web surfing is ending up very confused.

So I'll ask here instead of trying to unravel this on my own: Should I just stick with the brute force approach I described above? Or is there something better that I might be doing?

But if doing something "better" is going to require days of effort wading through incomprehensible documentation, doing frustrating experimentation, and pulling out all of my hair, then I'd rather just stick with the dumb brute force approach that I already understand and that I know for sure will work.

Thanks.

Answer 1

I'd suggest migrating so FastAPI for this. It is significantly faster, really easy to use (especially if you're migrating from Flask), and is used by a lot of people for ML inference.

FastAPI uses the newer async functionality in python, which allows it to handle significantly more requests with the same amount of resources.

You can also use existing docker containers for either flask or fastapi rather than configuring yourself.

Answer 2

As suggested by tedivm, I used FastAPI and uvicorn to implement a working solution.

Here's a sample little server program which is named test_fast_api.py . It responds to both GET and POST requests (the POST requests should be in JSON) and the responses are in JSON:

from typing import List
from fastapi import FastAPI

app = FastAPI()

@app.get("/score/{seq}")
async def score(seq: str):
    return len(seq)

@app.get("/scores/{seqs}")
async def scores(seqs: str):
    return [len(seq) for seq in seqs.split(",")]

@app.post("/scores")
async def scores_post(seqs: List[str]):
    return [len(seq) for seq in seqs]

This service can then be served by 10 processes like so:

$ uvicorn --workers 10 --port 12345 test_fast_api:app

If this service were actually CPU-bound, running using 10 processes would allow it to make good use of 10 CPU cores.

Using Flask with CPU-bound requests that need to be parallelized onto multiple cores

Question

2 answers

solution1
1 2020-02-13 19:58:21

solution2
1 ACCPTED 2020-02-15 00:34:02

Using Flask with CPU-bound requests that need to be parallelized onto multiple cores

Question

2 answers

solution1 1 2020-02-13 19:58:21

solution2 1 ACCPTED 2020-02-15 00:34:02

solution1
1 2020-02-13 19:58:21

solution2
1 ACCPTED 2020-02-15 00:34:02