python pdf2image 来自链接“无法获取页数”

Question

I have a PDF link I want to convert to an Image so I ran this我有一个 PDF 链接我想转换为图像所以我运行了这个

import requests
import pdf2image
x = "https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
pdf = requests.get(x,stream=True,timeout=30)
images = pdf2image.convert_from_bytes(pdf.raw.read())

but I get this error但我得到这个错误

PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error (19): Illegal character '>'
Syntax Error (46): Illegal character ')'
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table

what should I do?我应该怎么办？

UPDATE:更新：

pdf.raw.read()[:100]
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x9c\xfdy@Rk\xfc>\x8a\xbe\x80\x8a\x9aC\x03\x15\x9aSY\n\xb5SI\xcaY\xd1\xb6\x13N\x80\xed\xdc\x91\x99i)X\x9a8U\x98\x8a\xd9\xb4\xd9\x84\x9a\x94F\x0e\x14\xa0\xb5\xcb\xac\x1d\xa6V\xa6\rH\xb5\xb7\xa2hZ6\x99\x9aJdj\xe2\x909\xdc\xe5\xfe}\xcf\xf9\xdd{\xcf\xf9\xe3\x9e\xbb\xfa#'

Answer 1

So you're getting a GZIP-encoded response.所以你会得到一个 GZIP 编码的响应。 Try the following.试试下面的。

import gzip
import requests
import pdf2image

url = "https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
response = requests.get(url, stream=True, timeout=30)
pdf = gzip.open(response.raw)
images = pdf2image.convert_from_bytes(pdf.read())

Alternatively, you can use或者，您可以使用

import requests
import pdf2image

url = "https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
response = requests.get(url, timeout=30)
images = pdf2image.convert_from_bytes(response.content)

and let requests do the decoding for you.并让requests为您进行解码。

python pdf2image 来自链接“无法获取页数”

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-07 13:23:57

python pdf2image 来自链接“无法获取页数”

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-07 13:23:57

解决方案1
1 已采纳 2020-12-07 13:23:57