简体   繁体   English

AWS 使用 texttract 开始文档分析不起作用

[英]AWS start document analysis using textract not working

I am doing a project for my school where I am supposed to do a document analysis on a form using textract and run that output to A2I where the algorithm will determine if the form is approved, rejected or review needed.我正在为我的学校做一个项目,我应该使用 textract 对表单进行文档分析,并将 output 运行到 A2I,其中算法将确定表单是否被批准、拒绝或需要审查。 This textract lambda function should be triggered once a document is uploaded to S3.一旦将文档上传到 S3,应该触发此文本 lambda function。 I am however getting syntax errors when I follow this documentation;但是,当我遵循此文档时,我会遇到语法错误; https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html

My code is:我的代码是:

import urllib.parse
import boto3

print('Loading function')

##Clients
s3 = boto3.client('s3')
textract = boto3.client('textract')

def analyzedata(bucketName,documentKey):
    print("Loading")
    AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": { 
      "S3Object": { 
         "Bucket": "bucketName",
         "Name": "documentKey",
      })
    detectedText = ''

    # Print detected text
    for item in AnalyzedData['Blocks']:
        if item['BlockType'] == 'LINE':
            detectedText += item['Text'] + '\n'
            
    return detectedText
      
def writeTextractToS3File(textractData, bucketName, createdS3Document):
    print('Loading writeTextractToS3File')
    generateFilePath = os.path.splitext(createdS3Document)[0] + '.csv'
    s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
    print('Generated ' + generateFilePath)





def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        detectedText = analyzedata(bucket, key)
        writeTextractToS3File(detectedText, bucket, key)
        
        return 'Processing Done!'
        
        
        
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

The code is not yet complete but I am already getting syntax errors:代码尚未完成,但我已经收到语法错误:

  "errorMessage": "Syntax error in module 'lambda_function': invalid syntax (lambda_function.py, line 13)",
  "errorType": "Runtime.UserCodeSyntaxError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\" Line 13\n        AnalyzedData= textract.Start_Document_Analysis(\"DocumentLocation\": { \n"
  ]
}

According to the boto3 docs , your syntax should be more like:根据boto3 docs ,您的语法应该更像:

AnalyzedData= textract.start_document_analysis(DocumentLocation={ 
  "S3Object": { 
     "Bucket": "bucketName",
     "Name": "documentKey",
  })

Also note that the FeatureTypes parameter is listed as required.另请注意, FeatureTypes参数是根据需要列出的。

You should try to pip install awscli您应该尝试 pip 安装 awscli

pip install awscli

or pip3 if that works better或 pip3 如果效果更好

Then import and try running the code.然后导入并尝试运行代码。

I think you are missing a starting curly bracket character for this.我认为您为此缺少一个起始大括号字符。

AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": { # missing { in this line
  "S3Object": { 
     "Bucket": "bucketName",
     "Name": "documentKey",
  })

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM