将基于图像的 pdf 转换为 python 中的图像文件 (png/jpg)

Question

我想将基于图像的 PDF 转换为 Python 中的图像（.png/.jpg）文件，因此我可以进一步使用此图像来获取表格数据。 我不想从命令行运行代码。

我目前正在使用 Python 3.7.1 版本和 Pycharm IDE。

我已经尝试过stackoverflow上提供的代码，但没有任何效果，它运行但无法从基于图像的PDF文件中提取图像。 下面是它的链接。 使用 Python 从 pdf 中提取图像

另外，尝试了 dzone.com 中的代码，下面是链接，但没有任何效果https://dzone.com/articles/exporting-data-from-pdfs-with-python

以下是基于图像的 PDF 文件链接：

链接1： https://www.molex.com/pdm_docs/sd/190390001_sd.pdf

链接2： https://www.te.com/commerce/DocumentDelivery/DDEController?Action=showdoc&DocId=Customer+Drawing%7FDT04-12PX-C015%7F-%7Fpdf%7FEnglish%7FENG_CD_DT04-12PX-C015_-.pdf%7FDT04- 12PA-C015

请为此提出任何解决方案。

Answer 1

pdf2image库将 pdf 转换为图像。 在查看您的 pdf 时，它们只是图像而已，您可以将页面转换为图像

安装

pip install pdf2image

安装后，您可以使用以下代码获取图像。

from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)

# Saving pages in jpeg format

for page in pages:
    page.save('out.jpg', 'JPEG')

将基于图像的 pdf 转换为 python 中的图像文件 (png/jpg)

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-04-24 04:12:08

将基于图像的 pdf 转换为 python 中的图像文件 (png/jpg)

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-04-24 04:12:08

解决方案1
2 已采纳 2020-04-24 04:12:08