[英]How to extract tabular data from images?
I have some sample images. 我有一些示例图像。 How to extract tabular data from images and store it into JSON format? 如何从图像中提取表格数据并将其存储为JSON格式?
Use pytesseract . 使用pytesseract 。 The code will be something like this. 该代码将是这样的。 You can try different modifications . 您可以尝试其他修改。 My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data. 我的代码可能无法解决整个问题。这只是一个示例代码,它将适用于黑色文本,但适用于蓝色和任何其他颜色,因此您必须相应地创建一个遮罩,然后提取该数据。
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open("temp.jpg")
maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS)
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)
For example for red colour detection you can refer to this post . 例如,对于红色检测,可以参考这篇文章 。 After getting the red text binarize the image and then run 得到红色文本后,将图像二值化,然后运行
text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))
Similerly you will have to do the same process for blue and so on. 同样,您将必须对蓝色执行相同的过程,依此类推。 I believe you can easily try to do it yorself, just play around with some values. 我相信您可以轻松地自己做,只是尝试一些价值观。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.