[英]How to convert pdf from url to image using pdf2image in python?
I am able to convert pdf file in my drive to images using pdf2image convert_to_path but when I try the same for pdf ' https://example.com/abc.pdf ', end up with multiple errors.
代碼:
url = 'https://example.com/abc.pdf'
scrape = urlopen(url) # for external files
pil_images = pdf2image.convert_from_bytes(scrape.read(), dpi=200,
output_folder=None, first_page=None, last_page=None,
thread_count=1, userpw=None,use_cropbox=False, strict=False,
poppler_path=r"C:\poppler-0.68.0_x86\poppler-0.68.0\bin",)
錯誤:
Unable to get page count. Syntax Error: Document stream is empty
也跟着下面的鏈接,但沒有運氣
Python3:下載 PDF 到 memory 並將第一頁轉換為圖像
身份驗證屏幕截圖:
按照本博客中的說明,首先從 URL 下載 pdf。 https://dzone.com/articles/simple-examples-of-downloading-files-using-python
如果您在 pdf 中有多個頁面,則使用此將 pdf 轉換為圖像或任何其他系列格式。
import ghostscript
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pdf2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
ghostscript.Ghostscript(*args)
對於身份驗證,試試這個。
import os
import requests
from urlparse import urlparse
username = 'foo'
password = 'sekret'
url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)
r = requests.get(url, auth=(username,password))
if r.status_code == 200:
with open(filename, 'wb') as out:
for bits in r.iter_content():
out.write(bits)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.