简体   繁体   中英

Form Recognizer Heavy Workload

My use case is the following:
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.

So far I'm using the Async version of the client and I send the 1000 coroutines concurrently.

tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)

# Handle retry
while pending:
    #  backoff in case of 429
    time.sleep(1)

    # concurrent call return_when all completed
    finished, pending = await asyncio.wait(
        pending, return_when=asyncio.ALL_COMPLETED
    )

    
    # check  if task has exception and register for new run.
    for task in finished:
        arg = tasks[task]

        if task.exception():
            new_task = asyncio.create_task(analyze_async(doc))
            tasks[new_task] = doc
            pending.add(new_task)
   

Now I'm not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty? Or another option? Thx

We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.

Follow the procedure to implement the heavy workload in form recognizer.

在此处输入图像描述

在此处输入图像描述

Make it for page blobs here for higher and best performance.

在此处输入图像描述

Redundancy is also required. Make it ZRS for better implementation.

在此处输入图像描述

Create a storage account to upload the files.

在此处输入图像描述

在此处输入图像描述

Go to CORS and add the URL required.

Set the Allowed origins to https://formrecognizer.appliedai.azure.com

在此处输入图像描述

Go to containers and upload the documents.

在此处输入图像描述

在此处输入图像描述

在此处输入图像描述

Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM