简体   繁体   English

在 Adob​​e pdf services API 上将 .pdf 转换为 .docx(使用 Python)

[英]Convert .pdf to .docx on Adobe pdf services API (using Python)

I'm trying to write a Python program converting ".pdf" files to ".docx" ones, using Adobe PDF Server API (free trial).我正在尝试编写一个 Python 程序,将“.pdf”文件转换为“.docx”文件,使用 Adob​​e PDF Server API(免费试用)。

I've found literature enabling to transform any ".pdf" file to a ".zip" file containing ".txt" files (restoring text data) and ".excel" files (returning tabular data).我发现文献能够将任何“.pdf”文件转换为包含“.txt”文件(恢复文本数据)和“.excel”文件(返回表格数据)的“.zip”文件。

import logging
import os.path

from adobe.pdfservices.operation.auth.credentials import Credentials
from adobe.pdfservices.operation.exception.exceptions import ServiceApiException, ServiceUsageException, SdkException
from adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options import ExtractPDFOptions
from adobe.pdfservices.operation.pdfops.options.extractpdf.extract_element_type import ExtractElementType
from adobe.pdfservices.operation.execution_context import ExecutionContext
from adobe.pdfservices.operation.io.file_ref import FileRef
from adobe.pdfservices.operation.pdfops.extract_pdf_operation import ExtractPDFOperation


logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))

try:
    # get base path.
    base_path =os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath("C:/..link.../extractpdf/extract_txt_from_pdf.ipynb"))))

    # Initial setup, create credentials instance.
    credentials = Credentials.service_account_credentials_builder()\
        .from_file(base_path + "\\pdfservices-api-credentials.json") \
        .build()

    #Create an ExecutionContext using credentials and create a new operation instance.
    execution_context = ExecutionContext.create(credentials)
    extract_pdf_operation = ExtractPDFOperation.create_new()

    #Set operation input from a source file.
    source = FileRef.create_from_local_file(base_path + "/resources/trs_pdf_file.pdf")
    extract_pdf_operation.set_input(source)

    # Build ExtractPDF options and set them into the operation
    extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
        .with_element_to_extract(ExtractElementType.TEXT) \
        .with_element_to_extract(ExtractElementType.TABLES) \
        .build()
    extract_pdf_operation.set_options(extract_pdf_options)

    #Execute the operation.
    result: FileRef = extract_pdf_operation.execute(execution_context)

    # Save the result to the specified location.
    result.save_as(base_path + "/output/Extract_TextTableau_From_trs_pdf_file.zip")
except (ServiceApiException, ServiceUsageException, SdkException):
    logging.exception("Exception encountered while executing operation")

But I can't yet get the conversion done to a ".docx" file, event after changing the name of the extracted file to name.docx但是我还不能完成转换为“.docx”文件,将提取的文件的名称更改为name.docx后的事件

I went to read the litterature of adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions() but didn't found ways to tune the extraction and change it from ".zip" to ".docx".我去阅读了adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions()但没有找到调整提取并将其从“.zip”更改为“.docx”的方法。 What things can I try next?接下来我可以尝试什么?

Unfortunately, right now the Python SDK is only supporting the Extract portion of our PDF services.不幸的是,目前 Python SDK 仅支持我们 PDF 服务的提取部分。 You could use the services via the REST APIs ( https://documentcloud.adobe.com/document-services/index.html#how-to-get-started- ) as an alternative.您可以通过 REST API ( https://documentcloud.adobe.com/document-services/index.html#how-to-get-started- ) 使用这些服务作为替代方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM