简体   繁体   English

如何从图像中提取表格数据?

[英]How to extract tabular data from images?

I have some sample images. 我有一些示例图像。 How to extract tabular data from images and store it into JSON format? 如何从图像中提取表格数据并将其存储为JSON格式?

图片1

Use pytesseract . 使用pytesseract The code will be something like this. 该代码将是这样的。 You can try different modifications . 您可以尝试其他修改。 My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data. 我的代码可能无法解决整个问题。这只是一个示例代码,它将适用于黑色文本,但适用于蓝色和任何其他颜色,因此您必须相应地创建一个遮罩,然后提取该数据。

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open("temp.jpg")

maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS) 

im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)

im = enhancer.enhance(2)
im = im.convert('1')

im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)

For example for red colour detection you can refer to this post . 例如,对于红色检测,可以参考这篇文章 After getting the red text binarize the image and then run 得到红色文本后,将图像二值化,然后运行

text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))

Similerly you will have to do the same process for blue and so on. 同样,您将必须对蓝色执行相同的过程,依此类推。 I believe you can easily try to do it yorself, just play around with some values. 我相信您可以轻松地自己做,只是尝试一些价值观。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从图像中提取表格数据 - Extract tabular data from images 如何以表格格式从发票中提取数据 - How to extract data from invoices in tabular format 从表格数据中提取列 - Extract column from tabular data 如何从包含表格数据的图像中提取数据? - How to extract data from image that contains tabular data? 当行数据分为两个单独的页面时,如何正确地从 pdf 中提取表格数据? - how to extract tabular data from pdf properly when a row data is divided in two separate pages? 如何从非表格的文本文件中提取父子数据? - How do I extract parent and child data from a text file that isn't tabular? 列表中的表格数据 - Tabular data from lists 浏览 pdf 文件以查找特定页面并使用 python 从图像中提取表格数据 - Navigate through a pdf file to find specific pages and extract tabular data from image with python 如何使用Gpread或XLWT从子列表的列表中创建表格数据结构? - How to create a tabular data structure from a list of sublists with Gpread or XLWT? 如何使用python读取从Excel复制到剪贴板的表格数据? - How to read tabular data copied from Excel into clipboard using python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM