簡體 English 中英

使用 python 將 pdf 文件轉換為 excel 與表中的圖像

[英]converting pdf file to excel with images in the table using python

原文 2021-04-10 10:46:25 6 2 python/ tabula

我需要轉換的 pdf 文件將在表格中包含圖像。 我想將 pdf 表格中的文本和圖像轉換為 excel。 請為我推薦合適的庫。

2 個解決方案

您需要編寫一個腳本來讀取和檢測字符和 excel 單元格。 你可以用 open-cv 和內置的字符識別工具來做到這一點，但我不知道它有多容易。 我能想到的另一種方法是制作一個 ML model，它將從圖像中識別 excel 表。 不過這真的很難，需要很多經驗。

您可以使用PikePDF從 pdf 中提取圖像：

from pikepdf import Pdf, PdfImage

filename = "sample.pdf"
example = Pdf.open(filename)

for i, page in enumerate(example.pages):
    for j, (name, raw_image) in enumerate(page.images.items()):
        image = PdfImage(raw_image)
        out = image.extract_to(fileprefix=f"{filename}-page{i:03}-img{j:03}")

提取圖像后，您可以使用OCR將圖像轉換為表格

Python 將 Excel 文件 (.xlsx) 轉換為 PDF (.pdf)

[英]Python Converting an Excel file (.xlsx) to a PDF (.pdf)

使用Python將PDF轉換為一系列圖像

[英]Converting a PDF to a series of images with Python

使用FPDF將圖像轉換為pdf時的頁面方向（python）

[英]Page Orientation when converting images to pdf using FPDF (python)

使用 Python 中的 comtypes 將 Excel 文檔轉換為 pdf 時出錯

[英]Error when converting Excel document to pdf using comtypes in Python

使用python將HTML表格轉換為CSV文件

[英]Converting HTML table to CSV file using python

使用Python將.sql文件轉換為文本表

[英]Converting .sql file to a text table using Python

從 PDF 文件中提取多個表並使用 python 將其轉換為 dataframe？

[英]Extracting multiple tables from a PDF file and converting it to dataframe using python?

在 Python 中使用 pdfkit 將多個 html 文件轉換為 pdf

[英]Converting Multiple html file into pdf using pdfkit in Python

使用python將rtf轉換為pdf

[英]Converting rtf to pdf using python

在python中將圖像轉換為csv文件

[英]Converting images to csv file in python

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Python 將 Excel 文件 (.xlsx) 轉換為 PDF (.pdf) 使用Python將PDF轉換為一系列圖像使用FPDF將圖像轉換為pdf時的頁面方向（python）使用 Python 中的 comtypes 將 Excel 文檔轉換為 pdf 時出錯使用python將HTML表格轉換為CSV文件使用Python將.sql文件轉換為文本表從 PDF 文件中提取多個表並使用 python 將其轉換為 dataframe？在 Python 中使用 pdfkit 將多個 html 文件轉換為 pdf 使用python將rtf轉換為pdf 在python中將圖像轉換為csv文件

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM