標簽[amazon-textract] - 堆棧內存溢出

[英]I do not want to write and read the same document in python

我有 pdf 個文件，我只想從第一頁提取信息。我的解決方案是：使用 PyPDF2 從 S3 讀取並僅保存第一頁。閱讀我保存的同一頁 pdf，轉換為 byte64 並在 AWS Textract 上分析它。它有效，但我不喜歡這個解決方案。什么需要保存並仍然讀取完全相同的文件？我不能在運行時 ...

在“使用 AWS AI 服務進行智能文檔處理”中調用 CreateFlowDefinition 操作時出現 ClientError

[英]ClientError when calling CreateFlowDefinition operation in 'Intelligent Document Processing with AWS AI Services'

在筆記本 Amazon Augmented AI (A2I) 和 Textract Analyze Document 中，運行以下腳本后發生錯誤返回以下錯誤： ClientError：調用 CreateFlowDefinition 操作時發生錯誤（ValidationException）：檢測 ...

AWS textract：PNG 和 JPG 圖像的 UnsupportedDocumentException。錯誤僅發生在生產中而不是本地

[英]AWS textract: UnsupportedDocumentException for PNG and JPG images. Error occurs only in production and not locally

當我將 FastAPI 應用程序部署到使用 AWS Textract 服務的 AWS Lambda 時，出現以下錯誤。奇怪的是，它在我的本地開發環境中工作得很好，但在我部署它時拋出這個錯誤。錯誤：以下是我的代碼：我試過的圖像是 png 和 jpg 圖像，而不是 pdf。 ...

如何刪除多個標題

[英]how to remove multiple headers

我有一個位於 pdf 中的電子表格，我在其中提取這些值並將它們轉換為 .csv，其中使用 Python 來自 aws 的 textract。但是，當我提取這些值時，有幾個標題，我只想保留第一個 header。請注意，在同一個 .csv 文件中，我有 3 個標頭，其中一個只有幾個值，但由於我想刪除它 ...

AWS：numpy ndarray 到“字節”的轉換

[英]AWS: numpy ndarray to 'Bytes' conversion

我正在嘗試通過 Python (boto3) 接口使用 Amazon Textract。從本地驅動器上傳文件時一切順利：我的問題是如何修改 client.detect_document_text() 命令以處理以前作為 numpy ndarrya 存儲在變量中的圖像。從 AWS 文檔我知道： ...

AWS Textract (OCR) 未檢測到某些單元格

[英]AWS Textract (OCR) not detecting some cells

我正在使用 AWS Textract 讀取表並將其解析為 PDF 到 CSV。真棒，AWS 有它的文檔！ https://docs.aws.amazon.com/textract/latest/dg/examples-export-table-csv.html 我已經按照他們的建議設置了異步方法， ...

在使用 boto3 python 的“texttract”啟動文檔 analisys 中使用 QUERY 選項時遇到困難

[英]Having difficulties using the QUERY option in "textract" start document analisys with boto3 python

我的問題是 textract 異步方法 start_document_analysis，可以選擇您要執行的分析類型，但是當我嘗試使用“查詢”功能時 => 您必須使用查詢列表傳遞另一個參數 => 一旦我傳遞了這個參數，boto3 就會拋出一個異常，即 Queries config 不被識別 ...

我正在使用 aws textract StartDocumentTextDetectionCommand 和 GetDocumentTextDetectionCommand。我只想返回行，而不是單個單詞

[英]I am using aws textract StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. I want only lines to be returned, not the single words

我正在使用 aws textract 和 nodejs 創建一個 OCR 內部工具，以檢測掃描的 pdf 中的文本，特別是 StartDocumentTextDetectionCommand 和 GetDocumentTextDetectionCommand。當前在塊對象列表中返回，首先包含行，然 ...

通過 Sagemaker 部署 Amazon Textract 應用程序

[英]Deploying Amazon Textract application via Sagemaker

我正在嘗試通過 Amazon Textract 構建一個應用程序，該應用程序從圖像中提取文本信息並驗證文本。我正在尋找一種通過 Sagemaker 部署應用程序的方法，但找不到任何部署應用程序的方法。基於 TensorFlow、PyTorch、Sklearn 等構建的模型可以通過 Sagemak ...

為什么在異步 promise 中計算變量兩次？

[英]Why is variable being calculated twice in an async promise?

目前我在 AWS 中有以下 function 使用 Textract SDK 異步函數：async function textParse(config) { const AWS = require("aws-sdk"); let textract = new AWS.Textract(); ...

Create_Failed S3BatchProcessor，AWS Lambda

[英]Create_Failed S3BatchProcessor, AWS Lambda

我在我的 textract pipline 文件夾中運行cdk deploy以進行大型文檔處理。但是，當我運行這個 porgram 我得到這個錯誤錯誤 | CREATE_FAILED | AWS::Lambda::Function | S3BatchProcessor6C619AEA Re ...

QuerieConfig 沒有被 AWS 識別

[英]QuerieConfig not getting identified by AWS

我正在運行此代碼：這是我得到的錯誤：如果我在沒有 QueriesConfig 的情況下運行它並將“QUERIES”作為 FeatureTypes，我會收到如下錯誤：我該如何解決這個問題？ ...

AWS Textract 與 Angular TS。我正在嘗試通過 aws-sdk/client-textract npm package 將 aws textract 與 angular 連接起來。但我得到 Credentialerror

[英]AWS Textract with Angular TS. I am trying to connect aws textract with angular through aws-sdk/client-textract npm package.But I get Credentialerror

這是我的 app.component.ts。我在 app.component.ts 中導入了 aws-sdk/client-textract 和給定我的文本區域，我不知道在哪里提供我的 access_key 和 secret_key 以及要為文本傳遞的參數是什么。如果有人可以解決這個問題，請幫助我 ...

如何計算具有 4 個點 (x1,y1,.,x4,y4) 的矩形在順時針方向上的旋轉角度，使其筆直或 0 度

[英]How to calculate rotation angle of a rectangle with 4 points (x1,y1,.,x4,y4) in clock-wise direction, to make it straight or 0 degree

給定一個多邊形（傾斜矩形），它代表一個單詞區域，其中 4 個點按順時針方向排列，如何識別旋轉角度以使其在視角中為 0 度以使文本歪斜？ ...

從簡單的正則表達式提取轉向 NER？

[英]Moving away from simple regex extraction to NER?

我們有一個相對“簡單”的業務項目：使用 OCR 將一些合同掃描（PDF 文件）數字化並從文本中提取實體。實體可以是簡單的東西，例如位於合同的某個小節中的特定價格，或者可以在第 5 節附近的某處找到的流程的通用定義。對於同一實體，不同的表述和語言可在不同的地方互換使用合同。我們有有限數量的示例（ ...

Python-Textract-Boto3 - 嘗試將方法調用的結果作為參數傳遞給同一方法，然后循環

[英]Python-Textract-Boto3 - Trying to pass result of a method call as an argument to the same method, and loop

我在 AWS S3 上有一個多頁 pdf，我正在使用 textract 提取所有文本。我可以分批獲得響應，其中第一個響應為我提供了一個“NextToken”，我需要將其作為 arg 傳遞給 get_document_analysis 方法。如何避免每次手動粘貼從上次運行收到的 NextToken ...

AWS textract 表單設計最佳實踐

[英]Aws textract form design best practices

我目前正在重新設計文檔和 forms，以提高使用 Aws textract 提取的便利性。您有經驗和最佳實踐可以分享嗎？問候 ...

是否可以在 Textract 處理結束時調用 lambda function

[英]Is it possible to call a lambda function at the end of Textract processing

是否可以在某些 AWS Textract 處理結束時調用 lambda function？ ...

OCR - 從文檔中讀取復選框和單選按鈕的問題

[英]OCR - Issue with reading checkbox and radiobuttons from documents

我有一個用例，我需要解析圖像或 PDF 以讀取調查表，其中包含名稱、年齡、地址等所有值作為鍵值對，並將數據加載到表中的每一列。現在我們正在使用 AWS Textract，我們正在按預期獲取所有信息，但復選框和單選按鈕的鍵值對未正確獲取。例如，問題是你來自印度嗎？和兩個單選按鈕是，否。如果選中 ...

AWS texttract - 開始費用分析

[英]AWS textract - start expense analysis

在使用 boto3 為 python 實施 aws textract 分析費用異步 api 時，我收到錯誤消息'Textract' object has no attribute 'start_expense_analysis'. 另一方面， start_document_text_detectio ...