简体   繁体   English

EasyOCR - 表格提取

[英]EasyOCR - Table extraction

I use easyocr to extract table from a photo or scanned PDF, but I have a problem in fine tuning the data as a table.我使用easyocr从照片中提取表格或扫描PDF,但是在将数据微调为表格时遇到问题。 I try to make a searchable pdf according to extracted coordinates but when I convert it to csv, the lines are not tune.我尝试根据提取的坐标制作可搜索的 pdf,但是当我将其转换为 csv 时,线条没有调好。 I would appreciate if someone guide me about this.如果有人指导我,我将不胜感激。

You can use my package for this : ocr-nanonets-wrapper.您可以为此使用我的包:ocr-nanonets-wrapper。 It works for both images and pdf.它适用于图像和pdf。

Download the package using pip: pip install ocr-nanonets-wrapper使用 pip 下载包: pip install ocr-nanonets-wrapper

Get an API Key.获取 API 密钥。 This key is free and gives you unlimited access to use the package.此密钥是免费的,可让您无限制地使用该软件包。

  • Go to nanonets.com and signup访问nanonets.com并注册
  • On your Nanonets Account, Go to My Account -> API Keys在您的 Nanonets 帐户上,转到我的帐户 -> API 密钥
  • Copy your API Key复制您的 API 密钥

Sharing code below to get tables as csv -在下面共享代码以获取表格为 csv -

from nanonets import NANONETSOCR
nanonets = NANONETSOCR()

nanonets.set_token('YOUR_API_KEY')

nanonets.image_to_csv('INPUT_FILE_PATH', filename = 'OUTPUT_FILE_NAME.csv')
nanonets.pdf_to_csv('INPUT_FILE_PATH', filename = 'OUTPUT_FILE_NAME.csv')

You can leave filename blank as well, that will just take your file name and append ".csv" to it.您也可以将filename留空,这只会获取您的文件名并将“.csv”附加到它。 The .csv output file will be created in your current directory. .csv 输出文件将在您的当前目录中创建。

Hope this helps :)希望这可以帮助 :)

As far as I know, easyocr currently does not support table recognition.据我所知,easyocr 目前不支持表格识别。 The best table recognition should be PaddleOCR's PP-Structure model.最好的表格识别应该是PaddleOCR的PP-Structure model。 This is what I use now, and the effect is very good.这是我现在用的,效果很好。 You can try it.你可以试试。

link: https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/README.md链接: https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/README.md

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM