Google Document AI 不會返回任何文檔的 textStyle 和字體信息

Question

我正在使用 Document AI 服務來 OCR 掃描和機器生成的 PDF 文檔。 我測試了 10 個不同的文檔，但沒有一個返回textStyle屬性（它總是空的）。

只是想確定該功能是否真的得到支持和工作，或者在文檔中提到只是為了展示。

textStyle信息對於我們的業務用例非常重要。 所以最早的回應將不勝感激。

我正在使用默認的 Google python 示例代碼

from google.api_core.client_options import ClientOptions
from google.cloud import documentai_v1 as documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION' # Format is 'us' or 'eu'
# processor_id = 'YOUR_PROCESSOR_ID' #  Create processor in Cloud Console
# file_path = '/path/to/local/pdf'
# mime_type = 'application/pdf' # Refer to https://cloud.google.com/document-ai/docs/processors-list for supported file types


def quickstart(
    project_id: str, location: str, processor_id: str, file_path: str, mime_type: str
):
    # You must set the api_endpoint if you use a location other than 'us', e.g.:
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the processor, e.g.:
    # projects/project_id/locations/location/processor/processor_id
    # You must create new processors in the Cloud Console first
    name = client.processor_path(project_id, location, processor_id)

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Load Binary Data into Document AI RawDocument Object
    raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)

    # Configure the process request
    request = documentai.ProcessRequest(name=name, raw_document=raw_document)

    result = client.process_document(request=request)

    # For a full list of Document object attributes, please reference this page:
    # https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document
    document = result.document

    # Read the text recognition output from the processor
    print("The document contains the following text:")
    print(document.text)

Answer 1

目前， textStyles屬性在文檔中被列為“占位符” ，這意味着它可能在將來由處理器填充，或者它可以用於最終用戶數據存儲。

你提到

textStyle信息對於我們的業務用例非常重要。

你能提供一些你的用例的上下文嗎？

Google Document AI 不會返回任何文檔的 textStyle 和字體信息

問題描述

1 個解決方案

解決方案1
0 2022-08-05 17:10:54

Google Document AI 不會返回任何文檔的 textStyle 和字體信息

問題描述

1 個解決方案

解決方案1 0 2022-08-05 17:10:54

解決方案1
0 2022-08-05 17:10:54