![](/img/trans.png)
[英]Convert a PDF to DOCX using Adobe PDF Services via REST API (with Python in Manjaro) Issues
[英]Convert a PDF to DOCX using Adobe PDF Services via REST API (with Python)
我正在嘗試查詢 Adobe PDF services API 以從 PDF 文檔生成(導出)DOCX。
我剛剛編寫了一個 python 代碼來生成一個不記名令牌,以便從Adobe PDF 服務中識別(請參閱此處的問題: https : //stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach- adobe-pdf-services-using-python-and-a-rest-api )。 然后我編寫了以下代碼段,我嘗試按照本頁中有關 Adobe PDF 服務的EXPORT
選項的說明進行操作(此處: https : //documentcloud.adobe.com/document-services/index.html#post-導出PDF )。
這是一段代碼:
import requests
import json
from requests.structures import CaseInsensitiveDict
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = CaseInsensitiveDict()
headers["x-api-key"] = "client_id"
headers["Authorization"] = "Bearer MYREALLYLONGTOKENIGOT"
headers["Content-Type"] = "application/json"
myfile = {"file":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j="""
{
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/trs_pdf_file_copy.pdf"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}"""
resp = requests.post(url=URL, headers=headers, json=json.dumps(j), files=myfile)
print(resp.text)
print(resp.status_code)
代碼的狀態是400我已經通過服務器的身份驗證但是由於print(resp.text)
我得到以下結果:
{"requestId":"the_request_id","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"}
我認為我在理解 Adobe 指南中關於 API 的 EXPORT 作業的 POST 方法的“表單參數”時遇到問題( https://documentcloud.adobe.com/document-services/index.html )。
你有什么改進的想法。 謝謝你 !
首先讓你變量j
作為 python dict
然后從中創建一個 JSON 字符串。 Adobe 的文檔中也不太清楚的是documentIn.cpf:location
的值需要與用於documentIn.cpf:location
的密鑰相同。 我已將其更正為腳本中的InputFile0
。 還猜測你想保存你的文件,所以我也添加了它。
import requests
import json
import time
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json, text/plain, */*',
'x-api-key': client_id,
'Prefer': "respond-async,wait=0",
}
myfile = {"InputFile0":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j={
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "InputFile0"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}
body = {"contentAnalyzerRequests": json.dumps(j)}
resp = requests.post(url=URL, headers=headers, data=body, files=myfile)
print(resp.text)
print(resp.status_code)
poll = True
while poll:
new_request = requests.get(resp.headers['location'], headers=headers)
if new_request.status_code == 200:
open('test.docx', 'wb').write(new_request.content)
poll = False
else:
time.sleep(5)
我不知道為什么 docx 文件(順便創建的很好)沒有打開,通過彈出窗口告訴內容不可讀。 可能是由於
'wb'
解析方法
我遇到過同樣的問題。 將請求內容類型轉換為“字節”解決了它。
poll = True
while poll:
new_request = requests.get(resp.headers['location'], headers=headers)
if new_request.status_code == 200:
with open('test.docx', 'wb') as f:
f.write(bytes(new_request.content))
poll = False
else:
time.sleep(5)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.