簡體   English   中英

在使用 boto3 python 的“texttract”啟動文檔 analisys 中使用 QUERY 選項時遇到困難

[英]Having difficulties using the QUERY option in "textract" start document analisys with boto3 python

我的問題是 textract 異步方法 start_document_analysis,可以選擇您要執行的分析類型,但是當我嘗試使用“查詢”功能時 =>

FeatureTypes=[
        'TABLES'|'FORMS'|'QUERIES',
    ], 

您必須使用查詢列表傳遞另一個參數 =>

QueriesConfig={
        'Queries': [
            {
                'Text': 'string',
                'Alias': 'string',
                'Pages': [
                    'string',
                ]
            },
        ]
    }

一旦我傳遞了這個參數,boto3 就會拋出一個異常,即 Queries config 不被識別為接受的參數之一,有沒有人在 python 之前使用過這個功能?

您可以通過這種方式使用:

def getJobResults(jobId):

    pages = []
    client = boto3.client('textract')
    response = client.get_document_analysis(JobId=jobId)
    pages.append(response)
    print("Resultset page recieved: {}".format(len(pages)))
    nextToken = None
    if('NextToken' in response):
        nextToken = response['NextToken']
    while(nextToken):
        response = client.get_document_analysis(JobId=jobId, NextToken=nextToken)
        pages.append(response)
        print("Resultset page recieved: {}".format(len(pages)))
        nextToken = None
        if('NextToken' in response):
            nextToken = response['NextToken']
    return pages


def get_kv_map(s3BucketName, documentName):

    client = boto3.client('textract')
    response = client.start_document_analysis(
        DocumentLocation={
            'S3Object': {
                'Bucket': s3BucketName,
                'Name': documentName
            }
        },
        FeatureTypes=['QUERIES'],
        QueriesConfig={
            'Queries': [
                {
                    "Text": "is 1. A. checkbox seleted"
                }
                
            ]
        }
    )
    
    job_id = response['JobId']
    response = client.get_document_analysis(JobId=job_id)
    status = response["JobStatus"]
    
    while(status == "IN_PROGRESS"):
        time.sleep(3)
        response = client.get_document_analysis(JobId=job_id)
        status = response["JobStatus"]
        print("Job status2: {}".format(status))
        
    response = getJobResults(job_id)    
    return response


def query_extraction():

    s3BucketName = "bucket-name"
    documentName = "xyz.pdf"

    data = get_kv_map(s3BucketName, documentName)
    
    return data

data = query_extraction()

希望這能解決您的問題

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM