简体   繁体   English

如何从云运行调用相同的云运行以并行运行请求?

[英]how to invoke the same cloud run from a cloud run to run requests parallely?

I am running an ETL process using cloud run.我正在使用云运行运行 ETL 流程。

I have 2000 files.我有2000个文件。 Only 1200 files are getting preprocessed and loaded in BIG Query.只有 1200 个文件在 BIG Query 中得到预处理和加载。 Because the cloud run is getting timed out.因为云运行正在超时。 So, I thought of dividing the load.所以,我想到了分配负载。

I am dividing 2000 files into set of 4 as 500 each and authenticating and using requests.post to call the same cloud run.我将 2000 个文件分成 4 个,每个 500 个,并验证和使用 requests.post 来调用相同的云运行。 However it executes one set after another with the same instance of cloud run.但是,它使用相同的云运行实例依次执行一组。 And it again times out它再次超时

How can I make it run parallely?我怎样才能让它并行运行?

As of now, max instances: 20. Concurrency: 1, CPU:2, Memory: 8GB.截至目前,最大实例数:20。并发数:1,CPU:2,Memory:8GB。

Well, I have worked on something like this.好吧,我做过这样的事情。 I am not sure if it'd help you since you haven't shared a single block of code.我不确定它是否对您有帮助,因为您没有共享一个代码块。 Here's a sample code of 2k JSON files I'd download.这是我要下载的 2k JSON 文件的示例代码。

You have 2000 files and 1200 of this is getting processed/loaded in GBQ before the cloud run timing out.您有 2000 个文件,其中 1200 个在云运行超时之前在 GBQ 中得到处理/加载。 What you can do is:你可以做的是:

    total_files = len(file_list)//1000   #let file list be 2000, total files will be 2.

    #divide the files into sets of 1000 and loop over them one by one
    
    for file_set in range(1,(total_files+1)):
        auth_and_trigger(file_list[(file_set-1)*1000:(file_set*1000)])

    #for files left after 1000*i , we finally trigger it.
    auth_and_trigger(file_list[(total_files)*1000:len(file_list)])

Now this is how you can call the cloud run with auth and trigger function for every 1000 files.现在,您可以使用 auth 调用云运行并为每 1000 个文件触发 function。

    def auth_and_trigger(self, rest_of_files):
    #your cloud run url
    receiving_service_url = 'https://cloudrun-url-uc.a.run.app/download'

    # Set up metadata server request
    # See https://cloud.google.com/compute/docs/instances/verifying-instance-identity#request_signature
    metadata_server_token_url = 'http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience='

    token_request_url = metadata_server_token_url + receiving_service_url
    token_request_headers = {'Metadata-Flavor': 'Google'}

    # Fetch the token
    token_response = requests.get(token_request_url, headers=token_request_headers)
    jwt = token_response.content.decode("utf-8")

    # Provide the token in the request to the receiving service
    receiving_service_headers = {'Authorization': f'bearer {jwt}'}

    try:
        threading.Thread(target=self.trigger_ingest,
                         args=(receiving_service_url,
                               {"files": rest_of_files},
                               receiving_service_headers
                               )).start()
    except Exception as error:
        logging.error(error)

Each thread will call a function trigger_ingest that will call the cloud run.每个线程都会调用一个 function trigger_ingest 来调用云运行。 The code for it is below:它的代码如下:

    def trigger_ingest(url, json, headers=""):
    service_response = requests.post(url=url,
                                     json=json,
                                     headers=headers
                                     )
    logging.info(service_response.content)

Now since you want a parallel execution, make sure no code gets repeated in the thread as you have it in the trigger for the cloud run.现在,由于您想要并行执行,因此请确保线程中没有代码重复,因为您在云运行的触发器中拥有它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM