[英]AWS start document analysis using textract not working
我正在為我的學校做一個項目,我應該使用 textract 對表單進行文檔分析,並將 output 運行到 A2I,其中算法將確定表單是否被批准、拒絕或需要審查。 一旦將文檔上傳到 S3,應該觸發此文本 lambda function。 但是,當我遵循此文檔時,我會遇到語法錯誤; https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html
我的代碼是:
import urllib.parse
import boto3
print('Loading function')
##Clients
s3 = boto3.client('s3')
textract = boto3.client('textract')
def analyzedata(bucketName,documentKey):
print("Loading")
AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": {
"S3Object": {
"Bucket": "bucketName",
"Name": "documentKey",
})
detectedText = ''
# Print detected text
for item in AnalyzedData['Blocks']:
if item['BlockType'] == 'LINE':
detectedText += item['Text'] + '\n'
return detectedText
def writeTextractToS3File(textractData, bucketName, createdS3Document):
print('Loading writeTextractToS3File')
generateFilePath = os.path.splitext(createdS3Document)[0] + '.csv'
s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
print('Generated ' + generateFilePath)
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
detectedText = analyzedata(bucket, key)
writeTextractToS3File(detectedText, bucket, key)
return 'Processing Done!'
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
代碼尚未完成,但我已經收到語法錯誤:
"errorMessage": "Syntax error in module 'lambda_function': invalid syntax (lambda_function.py, line 13)",
"errorType": "Runtime.UserCodeSyntaxError",
"stackTrace": [
" File \"/var/task/lambda_function.py\" Line 13\n AnalyzedData= textract.Start_Document_Analysis(\"DocumentLocation\": { \n"
]
}
根據boto3 docs ,您的語法應該更像:
AnalyzedData= textract.start_document_analysis(DocumentLocation={
"S3Object": {
"Bucket": "bucketName",
"Name": "documentKey",
})
另請注意, FeatureTypes
參數是根據需要列出的。
您應該嘗試 pip 安裝 awscli
pip install awscli
或 pip3 如果效果更好
然后導入並嘗試運行代碼。
我認為您為此缺少一個起始大括號字符。
AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": { # missing { in this line
"S3Object": {
"Bucket": "bucketName",
"Name": "documentKey",
})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.