简体   繁体   中英

Parallel queries in python, with a file and several pods in kubernetes

I'm developing a python project that will be hosted on kubernets, on the Google Cloud provider. The idea is to read a file of millions of rows, where each row is the query's input key in an API

def getEndpoint(line):
    payload="{\r\n    \"keyQuery\": \""+line+"\"\r\n}"
    headers = {
        'Content-Type': 'application/json'
    }
    response = requests.request("POST", url, headers=headers, data=payload, verify=False)
    response = response.text.encode('utf8')

fileOpen = open('file.txt', 'r')
for lines in fileOpen:
                getEndpoint(lines)

I want to run my application on several Kubernetes PODs, because I want to have scalability, that is, multiple queries running at the same time. However, in this code structure, each pod will end up iterating over the file from the beginning, reading lines already consulted. And it is not what I want. Then two ideas came up:

  1. Split the files, and each split would be distributed among the pods. (Example: for a 100 line file with 10 pods, each pod would read 10 lines)

  2. Before running the application, create a consumption queue with the lines of the file, so that all pods would read the queue and not the file directly.

Option 2 seems to me to be more scalable and faster. But I would like suggestions for the best way to make a query using a file as a reference. I may want to run, for example, 1 million queries in 24 hours.

I think that you should work it with asynchronous tasks worker such as Celery or Dramatiq .

The conceptual is send the tasks (your query line) into the Message Broker (Redis or RabbitMQ) then let the worker consumed the tasks (receive the query line and make a request to API)

However, The Dramatiq or Celery have a feature like retry the task if it failed.

Not sure is that you want, But you can explore with this repo, https://github.com/matiaslindgren/celery-kubernetes-example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM