[英]Convert a PDF to DOCX using Adobe PDF Services via REST API (with Python)
I am trying to query Adobe PDF services API to generate (export) DOCX from PDF documents.我正在尝试查询 Adobe PDF services API 以从 PDF 文档生成(导出)DOCX。
I just wrote a python code to generate a Bearer Token in order to be identified from Adobe PDF services (see the question here: https://stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach-adobe-pdf-services-using-python-and-a-rest-api ).我刚刚编写了一个 python 代码来生成一个不记名令牌,以便从Adobe PDF 服务中识别(请参阅此处的问题: https : //stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach- adobe-pdf-services-using-python-and-a-rest-api )。 Then I wrote the following piece of code, where I tried to follow the instruction in this page concerning the
EXPORT
option of Adobe PDF services (here: https://documentcloud.adobe.com/document-services/index.html#post-exportPDF ).然后我编写了以下代码段,我尝试按照本页中有关 Adobe PDF 服务的
EXPORT
选项的说明进行操作(此处: https : //documentcloud.adobe.com/document-services/index.html#post-导出PDF )。
Here is the piece of code :这是一段代码:
import requests
import json
from requests.structures import CaseInsensitiveDict
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = CaseInsensitiveDict()
headers["x-api-key"] = "client_id"
headers["Authorization"] = "Bearer MYREALLYLONGTOKENIGOT"
headers["Content-Type"] = "application/json"
myfile = {"file":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j="""
{
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/trs_pdf_file_copy.pdf"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}"""
resp = requests.post(url=URL, headers=headers, json=json.dumps(j), files=myfile)
print(resp.text)
print(resp.status_code)
The status of the code is 400 I am tho well authentified by the server But I get the following as a result of print(resp.text)
:代码的状态是400我已经通过服务器的身份验证但是由于
print(resp.text)
我得到以下结果:
{"requestId":"the_request_id","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"}
I think that I have problems understanding the "form parameters" from the Adobe Guide concerning POST method for the EXPORT job of the API ( https://documentcloud.adobe.com/document-services/index.html ).我认为我在理解 Adobe 指南中关于 API 的 EXPORT 作业的 POST 方法的“表单参数”时遇到问题( https://documentcloud.adobe.com/document-services/index.html )。
Would you have any ideas for improvement.你有什么改进的想法。 thank you !
谢谢你 !
Make you variable j
as a python dict
first then create a JSON string from it.首先让你变量
j
作为 python dict
然后从中创建一个 JSON 字符串。 What's also not super clear from Adobe's documentation is the value for documentIn.cpf:location
needs to be the same as the key used for you file. Adobe 的文档中也不太清楚的是
documentIn.cpf:location
的值需要与用于documentIn.cpf:location
的密钥相同。 I've corrected this to InputFile0
in your script.我已将其更正为脚本中的
InputFile0
。 Also guessing you want to save your file so I've added that too.还猜测你想保存你的文件,所以我也添加了它。
import requests
import json
import time
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json, text/plain, */*',
'x-api-key': client_id,
'Prefer': "respond-async,wait=0",
}
myfile = {"InputFile0":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j={
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "InputFile0"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}
body = {"contentAnalyzerRequests": json.dumps(j)}
resp = requests.post(url=URL, headers=headers, data=body, files=myfile)
print(resp.text)
print(resp.status_code)
poll = True
while poll:
new_request = requests.get(resp.headers['location'], headers=headers)
if new_request.status_code == 200:
open('test.docx', 'wb').write(new_request.content)
poll = False
else:
time.sleep(5)
I don't know why the docx file (its well created by the way) doesn't open, telling via popup that the content is not readable.
我不知道为什么 docx 文件(顺便创建的很好)没有打开,通过弹出窗口告诉内容不可读。 maybe it's due to the
'wb'
parsing methos可能是由于
'wb'
解析方法
I had the same issue.我遇到过同样的问题。 Typecasting to 'bytes' the request contents solved it.
将请求内容类型转换为“字节”解决了它。
poll = True
while poll:
new_request = requests.get(resp.headers['location'], headers=headers)
if new_request.status_code == 200:
with open('test.docx', 'wb') as f:
f.write(bytes(new_request.content))
poll = False
else:
time.sleep(5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.