简体   繁体   中英

convert table in a jpg image to excel using python

I have a table which is a jpg image as in the link below and wanted to know how I could convert this into excel.

https://www.mining.com/wp-content/uploads/2021/04/TOP-50-Value-mining-companies-jumps-600-billion-from-covid-lows-.jpg

What library or open source software could be used along with python?

Thanks

I think you need to perform OCR (optical character recognition). That can be done with OpenCV and Tesseract. The output is usually a structured document or sometimes even a database that can be loaded into Python again.

What you are trying to do is not simple and is called OCR

I strongly suggest to find a different way to represent your data, an easy and common way is to use a format like JSON or CSV, but if you must you can try Tesseract to extract text from image. But it will require some pre and post processing.

Firstly, I would recommend cropping the image such that only the table is visible. Secondly, you can use OpenCV to detect and convert it in the CSV file by using contours and corners detection. You can use this link as a starting point reference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM