[英]Google Document AI does not return textStyle and font information for any document
我正在使用 Document AI 服務來 OCR 掃描和機器生成的 PDF 文檔。 我測試了 10 個不同的文檔,但沒有一個返回textStyle屬性(它總是空的)。
只是想確定該功能是否真的得到支持和工作,或者在文檔中提到只是為了展示。
textStyle信息對於我們的業務用例非常重要。 所以最早的回應將不勝感激。
我正在使用默認的 Google python 示例代碼
from google.api_core.client_options import ClientOptions
from google.cloud import documentai_v1 as documentai
# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION' # Format is 'us' or 'eu'
# processor_id = 'YOUR_PROCESSOR_ID' # Create processor in Cloud Console
# file_path = '/path/to/local/pdf'
# mime_type = 'application/pdf' # Refer to https://cloud.google.com/document-ai/docs/processors-list for supported file types
def quickstart(
project_id: str, location: str, processor_id: str, file_path: str, mime_type: str
):
# You must set the api_endpoint if you use a location other than 'us', e.g.:
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project_id/locations/location/processor/processor_id
# You must create new processors in the Cloud Console first
name = client.processor_path(project_id, location, processor_id)
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
# Load Binary Data into Document AI RawDocument Object
raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)
# Configure the process request
request = documentai.ProcessRequest(name=name, raw_document=raw_document)
result = client.process_document(request=request)
# For a full list of Document object attributes, please reference this page:
# https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document
document = result.document
# Read the text recognition output from the processor
print("The document contains the following text:")
print(document.text)
目前, textStyles
屬性在文檔中被列為“占位符” ,這意味着它可能在將來由處理器填充,或者它可以用於最終用戶數據存儲。
你提到
textStyle
信息對於我們的業務用例非常重要。
你能提供一些你的用例的上下文嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.