简体   繁体   English

有没有办法使用 python 将.pdf 转换为.csv?

[英]Is there a way to convert .pdf to .csv using python?

I'm currently experimenting with tabula-py, but all documentation samples I tried when extracting pdf data resulted in the following error: returned non-zero exit status 1.我目前正在尝试使用 tabula-py,但是我在提取 pdf 数据时尝试的所有文档示例都导致以下错误:返回非零退出状态 1。

So I'm just curious if there is other ways to convert data in tables on a pdf to a csv file using python.所以我只是好奇是否有其他方法可以使用 python 将 pdf 上的表中的数据转换为 csv 文件。

The answer for tabula-py is already available on StackOverflow & other resources.. to try using Camelot: tabula-py 的答案已经在 StackOverflow 和其他资源上可用.. 尝试使用 Camelot:

pip install camelot-py[cv]


import camelot
tables = camelot.read_pdf('X.pdf')
tables.export('X.csv', f='csv', compress=True) # you can also save it different file formats

Refer this link for more.有关更多信息,请参阅此链接

If you are looking to export tables from PDF to CSV files using Python the best way it to use libraries like Taluba and Camelot.如果您希望使用 Python 将表从 PDF 导出到 CSV 文件,这是使用 Taluba 和 Camelot 等库的最佳方式。

First we'll need to extract tables from individual pages and then libraries like pandas to export them into CSVs or other required formats.首先,我们需要从各个页面中提取表格,然后像 pandas 这样的库将它们导出为 CSV 或其他所需格式。

However, if the documents are non-electronic, we'll have to use OCR or ML techniques to extract tables.但是,如果文档是非电子文档,我们将不得不使用 OCR 或 ML 技术来提取表格。

Here's a blog post which has a few examples: https://nanonets.com/blog/pdf-table-to-csv/#pdf-table-extraction-to-csv-with-python这是一篇博客文章,其中包含一些示例: https://nanonets.com/blog/pdf-table-to-csv/#pdf-table-extraction-to-csv-with-python

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM